How to fix evaluation trajectories

I have tried the following change:
    def rollout(self):
        """
        Evaluates the performance of the model on a single episode.
        """
        episode_ret, episode_cost, episode_len = 0.0, 0.0, 0

        obs, info = self.env.reset(seed=1)
TypeError: reset() got an unexpected keyword argument 'seed'

What excatly is the source of randomness when re-running the same evaluation command? How can i always reproduce the same experiment and receive the same average reward/cost metrics for evaluation? I want to do so to compare the baseline with a small modification I did, but running the evaluation script each time produces very different results and I can't use that for comparisons.

this is the command i am using:
python "OSRL/examples/research/check/eval_bc" --device="mps" --path "OSRL/logs/OfflineSwimmerVelocityGymnasium-v1-cost-20/BC-safe_bc_modesafe_cost20_seed20-2180/BC-safe_bc_modesafe_cost20_seed20-2180" --eval_episode 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to fix evaluation trajectories #27

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to fix evaluation trajectories #27

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions