Accepted at NeurIPS 2025 Workshop on Aligning Reinforcement Learning Experimentalists and Theorists
Authors: Md Asifur Rahman, Risal Shahriar Shefin, Debashis Gupta, Sarra Alqahtani
Poster Link: https://drive.google.com/file/d/1eP5C35WeBim0hMl3UiK8Jf4KIX6IPE8J/view?usp=sharing
Paper Link: https://drive.google.com/file/d/14_1vpJ5JatJSEG17-cPzytPg71DBvZ7m/view?usp=sharing
$ conda create -n strpo-venv python=3.10
$ conda activate strpo-venv
$ pip install -r requirements.txtInstall Safety-Gymnasium 1.2.0 from here: https://github.com/PKU-Alignment/safety-gymnasium
- To train unsafe policy:
$ python main_distributional.py --env-name SafetyCarBuildingGoal1-v0 --adversary True --num-atoms 201 --v-min 0 --v-max 1000 --cvar-alpha 0.95- To train sTRPO:
$ python -m sTRPO.main --env-name SafetyCarCircle1-v0 --unsafe-agent-path <saved-model-directory>- TRPO Implementation: https://github.com/mjacar/pytorch-trpo