The code in this repository uses python 3.10 or 3.11 and linux as operating system
Some tasks from OpenAI Gym uses Box2D. For this to work you need to isnstall swig. You can do this using brew with the following command:
brew install swig
To initialize submodules run this command
git submodule update --init --recursive
When you have swig installed, you can install the required packages using the following command:
pip install -r requirements.txt
If you get Attribute error
AttributeError: module '_Box2D' has no attribute 'RAND_LIMIT_swigconstant'
Try install gymnasium wiht all, this should solve the problem
pip install gymnasium[all]
This repository contains the following environments:
CartPole-v1MountainCar-v0LunarLander-v2Acrobot-v1MountainCarContinuous-v0Pendulum-v1
To run the code, this will train a ppo policy with 1000 episodes and then run the experiments
python run_icml.py -a ppo -e Acrobot-v1 -pep 1000
To run a pre trained policy, you have to specify the seed
python run_icml.py -a ppo -e Acroot-v1 -pep 1000 --seed 42 -tr f
This has training false -tr f and will load a policy that have trained for 1000 episodes with the seed 42
The run_icml.py takes the following command line parameters
-a--algo: str- The algorithm to train-e--env: str- The gym environment to run-k --k-bins: int(default=1) - The number of bins to discretize environments with continuous action spaces-tr --train: bool(default=True) A bool that determines whether to train or not-ex --experiement: bool(default=True), if true run the experiment-ab --abstraction: bool(default=True), if true loads or creates the abstraction network-ep --episodesthe total amout of episodes to run for policy training and experiment training-pep --policy-episodes(default=None) the number of episodes to train the expert policy-eep --experiment-episodes(default=None ) the number of episodes to train the experiment-r --render: bool(default=Fale), if true renders one episode of the algorithm in the environment-rp --render-policy: bool(default=Fale), if true renders one episode of the expert policy in the environment-re --render-experiment: bool(default=Fale), if true renders one episode of the algorithm in the environment-s --save: bool(default=True), if true saves the trained model-l -load: bool(default=False), if true load a trained abstraction network, with specified time-steps and algo-le -load-experiment: bool(default=False), if true load a trained abstraction network, with specified time-steps and algo
To run the code, you can use the following command:
python CAT-RL.py
This will run all the environments and render the model after training.
There are also some optional arguments you can use:
--envor-eto specify the environment you want to run- default:
MountainCar - options:
MountainCar, MountainCarContinuous,CartPole, LunarLander, Acrobot, Pendulum
- default:
--trainor-tto train the model- default:
t(True) - options:
t, f(True, False)
- default:
--renderor-rto render the model- default:
t(True) - options:
t, f(True, False)
- default:
--seedor-sto specify the seed. If rendering without training, you need to set the seed of the trained model- default:
0
- default:
--verboseor-vto print the progress of the training- default:
t(True) - options:
t, f(True, False)
- default:
--helpor-hto get help
For example, to run the CartPole-v1 environment without rendering the model, you can use the following command:
python CAT-RL.py -r f -e CartPole
or to just render a trained model with seed 123 from the CartPole-v1 environment, you can use the following command:
python CAT-RL.py -t f -e CartPole -s 123
To run the code, you can use the following command:
python tileCoding.py
This will run the code and render the model after training.
There are also some optional arguments you can use:
--envor-eto specify the environment you want to run- default:
MountainCar - options:
MountainCar, MountainCarContinuous,CartPole, LunarLander, Acrobot, Pendulum
- default:
--trainor-tto train the model- default:
t(True) - options:
t, f(True, False)
- default:
--renderor-rto render the model- default:
t(True) - options:
t, f(True, False)
- default:
--seedor-sto specify the seed. If rendering without training, you need to set the seed of the trained model- default:
0
- default:
--verboseor-vto print the progress of the training- default:
t(True) - options:
t, f(True, False)
- default:
--helpor-hto get help
For example, to run the CartPole-v1 environment without rendering the model, you can use the following command:
python tileCoding.py -r f -e CartPole
or to just render a trained model with seed 123 from the CartPole-v1 environment, you can use the following command:
python tileCoding.py -t f -e CartPole -s 123
To run the code, you can use the following command:
python binQlearning.py
This will run the code and render the model after training.
There are also some optional arguments you can use:
--envor-eto specify the environment you want to run- default:
MountainCar - options:
MountainCar, MountainCarContinuous,CartPole, LunarLander, Acrobot, Pendulum
- default:
--trainor-tto train the model- default:
t(True) - options:
t, f(True, False)
- default:
--renderor-rto render the model- default:
t(True) - options:
t, f(True, False)
- default:
--seedor-sto specify the seed. If rendering without training, you need to set the seed of the trained model- default:
0
- default:
--verboseor-vto print the progress of the training- default:
t(True) - options:
t, f(True, False)
- default:
--helpor-hto get help
For example, to run the CartPole-v1 environment without rendering the model, you can use the following command:
python binQlearning.py -r f -e CartPole
or to just render a trained model with seed 123 from the CartPole-v1 environment, you can use the following command:
python binQlearning.py -t f -e CartPole -s 123
To run the experiments, you can use the following command:
python run_exp.py
By default, this will run each algorithm 20 times for all the environments. The results will be saved in the results folder and the models will be saved in the models folder.
There are also some optional arguments you can use:
--numor-n: specify the number of times to run each algorithm for each environment- default:
10
- default:
The trained models for the different environments can be found in the models folder. The models are saved as .pkl files and can be loaded using the pickle library in Python.
To run the expriment with trained models, you can use the following command:
python run_after_train.py
Data for this expriment can be found in the results-after-train folder.
To test the k bins, you can use the following command:
python test_k_bins.py
Data for this expriment can be found in k_bins_result-<bin>.