This project implements Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms from scratch, along with ablation studies to understand the contribution of each TD3 component. The implementation is designed for educational purposes and research, avoiding off-the-shelf solutions.
- Python 3.11 or higher
uvpackage manager (Installation instructions)- CUDA-compatible GPU (optional, but recommended for faster training)
- Create and activate the Python environment:
$ cd rl-project
$ uv sync # add --all-groups for installing notebook dependencies
$ source .venv/bin/activateTo reproduce all ablation study results:
$ python config/ablations.pyThis will run all five algorithms (TD3, TD3-CDQ, TD3-DP, TD3-TPS, DDPG) sequentially and save results to TensorBoard logs.
runs/ablations/
├── td3/ # Full TD3 results
├── td3_cdq/ # TD3 without clipped double Q-learning
├── td3_dp/ # TD3 without delayed policy updates
├── td3_tps/ # TD3 without target policy smoothing
└── ddpg/ # Baseline DDPG results
After running experiments, view results with TensorBoard:
$ tensorboard --logdir=runs/ablations.
├── src/rl_project/
│ ├── algos/
│ │ ├── ddpg.py # Main DDPG/TD3 implementation
│ │ └── evaluate.py # Evaluation utilities
│ ├── models/
│ │ └── mlp.py # Actor and Critic network architectures
│ ├── config/
│ │ └── schemas.py # Configuration schemas
│ └── utils/
│ ├── envs.py # Environment utilities
│ └── video/ # Video recording utilities
├── config/
│ ├── *.yaml # Algorithm configurations
│ └── ablations.py # Script to run all experiments
├── checkpoints/ # Policy network checkpoints
└── runs/ # TensorBoard logs
- PyTorch (
torch==2.7.1): Neural network implementation and automatic differentiation - Gymnasium (
gymnasium==1.2.0): Environment interface and vectorization - NumPy (
numpy==2.3.2): Array operations and mathematical computations - TensorBoard (
tensorboard==2.20.0): Experiment logging and visualization - Pydantic (
pydantic==2.11.7): Configuration validation and type checking
Run these from the repository root (where src/, config/, checkpoints/ live):
# Make the package importable
pip install -e .
# Build a montage: random → TD3 @10ep → TD3 @150ep → TD3 best → DDPG best
python -m rl_project.utils.video.demo \
--td3-config config/td3.yaml \
--td3-best checkpoints/td3/actor_best.pth \
--td3-mid checkpoints/td3/actor_episode_10.pth \
--td3-mid checkpoints/td3/actor_episode_150.pth \
--ddpg-config config/ddpg.yaml \
--ddpg-best checkpoints/ddpg/actor_best.pth \
--videos-dir videosThe report is a LaTeX project that uses biblatex + biber. You'll need a TeX distribution, latexmk, and biber.
sudo apt-get update
sudo apt-get install -y texlive-full latexmk biberbrew install --cask mactex
sudo tlmgr update --self --all- Install MiKTeX and enable on-the-fly package installs (MiKTeX Console → Settings → General → Install missing packages on-the-fly: Yes).
- Ensure biber is installed via MiKTeX Console (Packages → search "biber").
From the repo root:
cd report
latexmk -C
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex
biber report
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.texlatexmk -Ccleans aux files.- Two extra
pdflatexruns resolve cross-refs afterbiber. - Output is at
report/report.pdf.