Skip to content

ftrifoglio/rl-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL Project

This project implements Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms from scratch, along with ablation studies to understand the contribution of each TD3 component. The implementation is designed for educational purposes and research, avoiding off-the-shelf solutions.

Prerequisites

  • Python 3.11 or higher
  • uv package manager (Installation instructions)
  • CUDA-compatible GPU (optional, but recommended for faster training)

Installation

  1. Create and activate the Python environment:
$ cd rl-project
$ uv sync  # add --all-groups for installing notebook dependencies
$ source .venv/bin/activate

Reproducing Results

To reproduce all ablation study results:

$ python config/ablations.py

This will run all five algorithms (TD3, TD3-CDQ, TD3-DP, TD3-TPS, DDPG) sequentially and save results to TensorBoard logs.

Results Directory Structure

runs/ablations/
├── td3/                   # Full TD3 results
├── td3_cdq/               # TD3 without clipped double Q-learning
├── td3_dp/                # TD3 without delayed policy updates
├── td3_tps/               # TD3 without target policy smoothing
└── ddpg/                  # Baseline DDPG results

After running experiments, view results with TensorBoard:

$ tensorboard --logdir=runs/ablations

Project Structure

.
├── src/rl_project/
│   ├── algos/
│   │   ├── ddpg.py             # Main DDPG/TD3 implementation
│   │   └── evaluate.py         # Evaluation utilities
│   ├── models/
│   │   └── mlp.py              # Actor and Critic network architectures
│   ├── config/
│   │   └── schemas.py          # Configuration schemas
│   └── utils/
│       ├── envs.py             # Environment utilities
│       └── video/              # Video recording utilities
├── config/
│   ├── *.yaml                  # Algorithm configurations
│   └── ablations.py            # Script to run all experiments
├── checkpoints/                # Policy network checkpoints
└── runs/                       # TensorBoard logs

Core Packages Used

  • PyTorch (torch==2.7.1): Neural network implementation and automatic differentiation
  • Gymnasium (gymnasium==1.2.0): Environment interface and vectorization
  • NumPy (numpy==2.3.2): Array operations and mathematical computations
  • TensorBoard (tensorboard==2.20.0): Experiment logging and visualization
  • Pydantic (pydantic==2.11.7): Configuration validation and type checking

Create the agent learning video

Run these from the repository root (where src/, config/, checkpoints/ live):

# Make the package importable
pip install -e .

# Build a montage: random → TD3 @10ep → TD3 @150ep → TD3 best → DDPG best
python -m rl_project.utils.video.demo \
  --td3-config config/td3.yaml \
  --td3-best   checkpoints/td3/actor_best.pth \
  --td3-mid    checkpoints/td3/actor_episode_10.pth \
  --td3-mid    checkpoints/td3/actor_episode_150.pth \
  --ddpg-config config/ddpg.yaml \
  --ddpg-best   checkpoints/ddpg/actor_best.pth \
  --videos-dir  videos

Build the PDF Report

The report is a LaTeX project that uses biblatex + biber. You'll need a TeX distribution, latexmk, and biber.

Prerequisites

Linux (TeX Live)

sudo apt-get update
sudo apt-get install -y texlive-full latexmk biber

macOS (MacTeX)

brew install --cask mactex
sudo tlmgr update --self --all

Windows (MiKTeX)

  • Install MiKTeX and enable on-the-fly package installs (MiKTeX Console → Settings → General → Install missing packages on-the-fly: Yes).
  • Ensure biber is installed via MiKTeX Console (Packages → search "biber").

Compile

From the repo root:

cd report
latexmk -C
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex
biber report
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex
pdflatex -interaction=nonstopmode -halt-on-error -shell-escape report.tex
  • latexmk -C cleans aux files.
  • Two extra pdflatex runs resolve cross-refs after biber.
  • Output is at report/report.pdf.

About

Implements Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms from scratch, along with ablation studies to understand the contribution of each TD3 component.

Resources

Stars

Watchers

Forks

Contributors