Skip to content

risal-shefin/sTRPO

Repository files navigation

sTRPO: Safe, Trust Region Policy Optimization for Constrained Reinforcement Learning

Accepted at NeurIPS 2025 Workshop on Aligning Reinforcement Learning Experimentalists and Theorists
Authors: Md Asifur Rahman, Risal Shahriar Shefin, Debashis Gupta, Sarra Alqahtani
Poster Link: https://drive.google.com/file/d/1eP5C35WeBim0hMl3UiK8Jf4KIX6IPE8J/view?usp=sharing
Paper Link: https://drive.google.com/file/d/14_1vpJ5JatJSEG17-cPzytPg71DBvZ7m/view?usp=sharing

Environment Setup

$ conda create -n strpo-venv python=3.10
$ conda activate strpo-venv
$ pip install -r requirements.txt

Install Safety-Gymnasium 1.2.0 from here: https://github.com/PKU-Alignment/safety-gymnasium

Commands

  • To train unsafe policy:
$ python main_distributional.py --env-name SafetyCarBuildingGoal1-v0 --adversary True --num-atoms 201 --v-min 0 --v-max 1000 --cvar-alpha 0.95
  • To train sTRPO:
$ python -m sTRPO.main --env-name SafetyCarCircle1-v0 --unsafe-agent-path <saved-model-directory>

Acknowledgement

About

sTRPO: Safe, Trust Region Policy Optimization for Constrained Reinforcement Learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages