Submitted to ECC 2025.
-
results/contains.csvfiles for all results -
Safe-Policy-Optimization/contains the code from this respository.- The algorithms we used are inside the
/safepo/single_agent/directory. These arecpo.py(CPO),ppo_ewc_cost.py(Safe EWC), andppo_ewc.py(PPO+EWC).ppo_ewc_lambda.pyis used for tuning the$\lambda$ hyperparameter.
- The algorithms we used are inside the
-
safety-gymnasium/contains the code from this repository.- The continual RL environments we created that are used in the paper are in '/safety_gymnasium/tasks/safe_velocity/'. Specifically, the
safety_half_cheetah_valocity_v4.pyis the HalfCheetah nonstationary safety constrained task andsafety_ant_velocity_v2.pyis the Ant.
- The continual RL environments we created that are used in the paper are in '/safety_gymnasium/tasks/safe_velocity/'. Specifically, the
-
Analyze Results.ipynbcontains the analysis of the results. -
Lambda Experiment.ipynbcontains a hyperparameter experiment to choose EWC$\lambda$ . -
Environment Testcan be used to test the environments and visualize them.
- Enter the
/Safe-Policy-Optimization/safepo/single_agent/directory. (e.g.,cd /Safe-Policy-Optimization/safepo/single_agent/) - Train an agent by running the chosen algorithm as follows
-
python algorithm.py --task taskname -- experiment experiment_name-
algorithmis one ofcpo,ppo_ewc,ppo_ewc_cost, orppo_ewc_lambda. -
tasknameisSafeHalfCheetahVelocity-v4orSafeAntVelocity-v2. -
experimentis your experiment name which will be saved in theruns/folder. -
--ewc_lambda numwill set the value of$\lambda$ , the tradeoff between remembering previous tasks and learning on old tasks tonum. -
--task-length numis the number of environment observations for each nonstationary task. -
--tasks 'task_list'is the task sequence. Ex: '[0, 1, 0, 1, 2, 0]'.
-
- For a comprehensive list of command line arguments, check the
single_agent_args()function in this file.
-
The results of the paper can be reproduced by running the above commands for seeds 0-4 for each. As detailed in the paper, use ewc_lambda=10, task-length=1_000_000, task_list='[0, 1, 0, 2, 1, 0, 2]', and total-steps=8_000_000. These results are saved in the results/ directory. Use Analyze Results.ipynb to see our analysis.
- From project top directory
conda env create -f environment.yml conda activate safe-continualcd safety-gymnasiumpip install -e .
If you experience any issues, you may need to setup your own conda env and install safety gymnasium, then add packages as necessary. Alternatively, if your installation is not time-sensitive, please feel free to raise an issue!