-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi,
thank you for a very interesting paper and the provided implementation of the algorithm. We really like this interesting new approach.
I am a part of a small team of 3 students; we've been given by our supervisor a task of reproducing the results from the paper. Unfortunatelly we've been unable to do so so far. I hope you can help us with the following:
-
The provided code does not work for some choices of parameters. In particular: setting "reset_agent" flag to true causes an error in the file training/train_online.py . It looks like lines 584 and 610 miss double star operator before the last argument of "initialize_agent" function call. This is easy to fix, but it indicates that the version of the code in the repo is not the one that you've used during your experiments (it is stated in the paper that in fact you have used agent resetting in your experiments). Also adding these 2 stars causes expected training time to raise from ~1h to ~1000h (by tqdm estimates), so it's not really a fix after all. Thus we have some questions:
- Would it be possible for you to upload the exact code that has been used during your experiments?
- If not - could you please help us solve the mentioned issue with resetting the agent? Any form of help would be greatly appreciated. -
As I've mentioned before we've been unable to reproduce results reported in the paper. In particular the final speed after 200k simulation steps was 20% lower than the value reported in the paper and the final fall count after 200k steps was 33% lower than the value reported in the paper. We didn't use agent resetting since it doesn't work and also it looks like it has not been used during generation of the fig. 9 (the are no deep falls like on fig. 6.). We've checked this on 1 seed only, which shouldn't matter since variance for APRL reported on fig. 9. is very low. We've checked both default parameters from the script from README as well as those reported on your webpage. They yield very similar results. Also while reading the code I've noticed that you shrink action space by a factor of 0.5 instead of 0.9 as reported in the paper (training/train_online.py, line 571; hard-coded constant) (changing it to 0.9 did not help us with reproducing the results). Here is the exact command that we've used in the experiment where we have set up parameters to be the same as in the paper:
/root/quadruped-rl/APRL/training/train_online.py
--env_name=Go1SanityMujoco-Empty-SepRew-v0
--save_buffer=True
--load_buffer
--utd_ratio=20
--start_training=1000
--config=configs/droq_config.py
--config.critic_layer_norm=True
--config.exterior_linear_c=10.0
--config.target_entropy=-12
--save_eval_videos=True
--eval_interval=-1
--save_training_videos=True
--training_video_interval=1000
--eval_episodes=1
--max_steps=200000
--log_interval=1000
--save_interval=1000
--project_name=APRL_sim_reproduce
--tqdm=True
--save_dir=saved_sim_exp
--task_config.action_interpolation=True
--task_config.enable_reset_policy=False
--task_config.Kp=20
--task_config.Kd=1.0
--task_config.limit_episode_length=0
--task_config.action_range=1.0
--task_config.frame_stack=0
--task_config.action_history=1
--task_config.rew_target_velocity=1.5
--task_config.rew_energy_penalty_weight=0.002
--task_config.rew_qpos_penalty_weight=2.0
--task_config.rew_smooth_torque_penalty_weight=0.005
--task_config.rew_pitch_rate_penalty_factor=0.4
--task_config.rew_roll_rate_penalty_factor=0.2
--task_config.rew_joint_diagonal_penalty_weight=0.00
--task_config.rew_joint_shoulder_penalty_weight=0.00
--task_config.rew_joint_acc_penalty_weight=0.0
--task_config.rew_joint_vel_penalty_weight=0.0
--task_config.center_init_action=True
--task_config.rew_contact_reward_weight=0.0
--action_curriculum_steps=10000
--action_curriculum_start=0.3
--action_curriculum_end=0.7
--action_curriculum_linear=True
--action_curriculum_exploration_eps=0.15
--task_config.filter_actions=8
--reset_curriculum=True
--reset_criterion=dynamics_error
--task_config.rew_smooth_change_in_tdy_steps=1
--threshold=1.0
The parameters that were not reported in the paper were left the same as in the README script. We have the following questions regarding parameters:
- The paper mentions multiple times the importance of agent resetting for the plasticity, yet it looks like (it's not stated explicitely in the paper) there were no agent resets used for the purpose of creating fig. 9. Why is that?
- Could you confirm that you have used shrinking factor of 0.9 and not 0.5 in your experiments as stated in Algorithm 1., line 11 of your paper? Are there any other hard-coded constants in the code that may potentially differ from what is reported in the paper?
- Could you please confirm that the parameters in the script above are the same as the ones that you have used to obtain your results?
- Finally - do you have any idea what else might cause our results to differ significantly from what is reported in the paper apart from what I have mentioned in my post?
Please let us know whether you can answer our questions during following week or 2, so that we know how to schedule our work. If possible also please let me know in case I've got something wrong in the post above or if you need some additional info or clarification regarding the questions above.
Thank you in advance,
Maciek