RL Project

This project implements Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms from scratch, along with ablation studies to understand the contribution of each TD3 component. The implementation is designed for educational purposes and research, avoiding off-the-shelf solutions.

Prerequisites

Python 3.11 or higher
uv package manager (Installation instructions)
CUDA-compatible GPU (optional, but recommended for faster training)

Installation

Create and activate the Python environment:

$ cd rl-project
$ uv sync  # add --all-groups for installing notebook dependencies
$ source .venv/bin/activate

Reproducing Results

To reproduce all ablation study results:

$ python config/ablations.py

This will run all five algorithms (TD3, TD3-CDQ, TD3-DP, TD3-TPS, DDPG) sequentially and save results to TensorBoard logs.

Results Directory Structure

runs/ablations/
├── td3/                   # Full TD3 results
├── td3_cdq/               # TD3 without clipped double Q-learning
├── td3_dp/                # TD3 without delayed policy updates
├── td3_tps/               # TD3 without target policy smoothing
└── ddpg/                  # Baseline DDPG results

After running experiments, view results with TensorBoard:

$ tensorboard --logdir=runs/ablations

Project Structure

.
├── src/rl_project/
│   ├── algos/
│   │   ├── ddpg.py             # Main DDPG/TD3 implementation
│   │   └── evaluate.py         # Evaluation utilities
│   ├── models/
│   │   └── mlp.py              # Actor and Critic network architectures
│   ├── config/
│   │   └── schemas.py          # Configuration schemas
│   └── utils/
│       ├── envs.py             # Environment utilities
│       └── video/              # Video recording utilities
├── config/
│   ├── *.yaml                  # Algorithm configurations
│   └── ablations.py            # Script to run all experiments
├── checkpoints/                # Policy network checkpoints
└── runs/                       # TensorBoard logs

Core Packages Used

PyTorch (torch==2.7.1): Neural network implementation and automatic differentiation
Gymnasium (gymnasium==1.2.0): Environment interface and vectorization
NumPy (numpy==2.3.2): Array operations and mathematical computations
TensorBoard (tensorboard==2.20.0): Experiment logging and visualization
Pydantic (pydantic==2.11.7): Configuration validation and type checking

Create the agent learning video

Run these from the repository root (where src/, config/, checkpoints/ live):

# Make the package importable
pip install -e .

# Build a montage: random → TD3 @10ep → TD3 @150ep → TD3 best → DDPG best
python -m rl_project.utils.video.demo \
  --td3-config config/td3.yaml \
  --td3-best   checkpoints/td3/actor_best.pth \
  --td3-mid    checkpoints/td3/actor_episode_10.pth \
  --td3-mid    checkpoints/td3/actor_episode_150.pth \
  --ddpg-config config/ddpg.yaml \
  --ddpg-best   checkpoints/ddpg/actor_best.pth \
  --videos-dir  videos

Build the PDF Report

The report is a LaTeX project that uses biblatex + biber. You'll need a TeX distribution, latexmk, and biber.

Prerequisites

Linux (TeX Live)

sudo apt-get update
sudo apt-get install -y texlive-full latexmk biber

macOS (MacTeX)

brew install --cask mactex
sudo tlmgr update --self --all

Windows (MiKTeX)

Install MiKTeX and enable on-the-fly package installs (MiKTeX Console → Settings → General → Install missing packages on-the-fly: Yes).
Ensure biber is installed via MiKTeX Console (Packages → search "biber").

Compile

From the repo root:

cd report
latexmk -C
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex
biber report
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex

latexmk -C cleans aux files.
Two extra pdflatex runs resolve cross-refs after biber.
Output is at report/report.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
checkpoints		checkpoints
config		config
img		img
notebooks		notebooks
runs		runs
src/rl_project		src/rl_project
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Project

Prerequisites

Installation

Reproducing Results

Results Directory Structure

Project Structure

Core Packages Used

Create the agent learning video

Build the PDF Report

Prerequisites

Linux (TeX Live)

macOS (MacTeX)

Windows (MiKTeX)

Compile

About

Uh oh!

Releases 9

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Project

Prerequisites

Installation

Reproducing Results

Results Directory Structure

Project Structure

Core Packages Used

Create the agent learning video

Build the PDF Report

Prerequisites

Linux (TeX Live)

macOS (MacTeX)

Windows (MiKTeX)

Compile

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 9

Contributors

Uh oh!

Languages