-
Notifications
You must be signed in to change notification settings - Fork 13
Description
I have tried the following change:
def rollout(self):
"""
Evaluates the performance of the model on a single episode.
"""
episode_ret, episode_cost, episode_len = 0.0, 0.0, 0
obs, info = self.env.reset(seed=1)
TypeError: reset() got an unexpected keyword argument 'seed'
What excatly is the source of randomness when re-running the same evaluation command? How can i always reproduce the same experiment and receive the same average reward/cost metrics for evaluation? I want to do so to compare the baseline with a small modification I did, but running the evaluation script each time produces very different results and I can't use that for comparisons.
this is the command i am using:
python "OSRL/examples/research/check/eval_bc" --device="mps" --path "OSRL/logs/OfflineSwimmerVelocityGymnasium-v1-cost-20/BC-safe_bc_modesafe_cost20_seed20-2180/BC-safe_bc_modesafe_cost20_seed20-2180" --eval_episode 1