Dopamax

Dopamax is a library containing pure JAX implementations of common reinforcement learning algorithms. Everything is implemented in JAX, including the environments. This allows for extremely fast training and evaluation of agents, because the entire loop of environment simulation, agent interaction, and policy updates can be compiled as a single XLA program and executed on CPUs, GPUs, or TPUs. More specifically, the implementations in Dopamax follow the Anakin Podracer architecture -- see this paper for more details.

Supported Algorithms

Dopamax implements several state-of-the-art reinforcement learning algorithms:

Proximal Policy Optimization (PPO) - On-policy actor-critic algorithm with clipped objective
Deep Q-Network (DQN) - Value-based algorithm with experience replay, supports double Q-learning, dueling networks, and prioritized replay
Deep Deterministic Policy Gradients (DDPG) - Off-policy actor-critic for continuous control
Twin Delayed DDPG (TD3) - Improved DDPG with twin critics and target policy smoothing
Soft Actor Critic (SAC) - Off-policy maximum entropy RL, supports both discrete and continuous actions
AlphaZero - MCTS-based algorithm for perfect-information games

Installation

Dopamax can be installed with:

pip install dopamax

This will install the dopamax Python package, as well as a command-line interface (CLI) for training and evaluation. Note that only the CPU version of JAX is installed by default. If you would like to use a GPU or TPU, you will need to install the appropriate version of JAX. See the JAX installation instructions.

Note

The above command will install the latest "release" of Dopamax, which may not necessarily align with the latest commit in the main branch. To install the version found in the main branch of this repository, you can use:

pip install git+https://github.com/rystrauss/dopamax.git

Quick Start

After installation, you can use the Dopamax CLI to train and evaluate agents:

# See all available commands
dopamax --help

# List available agents
dopamax list-agents

# List available environments
dopamax list-environments

# View default config for an agent
dopamax agent-config PPO

Dopamax uses Weights and Biases (W&B) for logging and artifact management. Before using the CLI for training and evaluation, you must first make sure you have a W&B account (it's free) and have authenticated with wandb login.

Training

Agent's can be trained using the dopamax train command, to which you must provide a configuration file. The configuration file is a YAML file that specifies the agent, environment, and training hyperparameters. You can find examples in the examples directory. For example, to train a PPO agent on the CartPole environment, you would run:

dopamax train --config examples/ppo-cartpole/config.yaml

Note that all of the example config files have a random seed specified, so you will get the same result every time you run the command. The seeds provided in the examples are known to result in a successful run (with the given hyperparameters). To get different results on each run, you can remove the seed from the config file.

Evaluation

Once you have trained some agents, you can evaluate them using the dopamax evaluate command. This will allow you to specify a W&B agent artifact that you'd like to evaluate (these artifacts are produced by the training runs and contain the agent hyperparameters and weights from the end of training). For example, to evaluate a PPO agent trained on CartPole, you might use a command like:

dopamax evaluate --agent_artifact CartPole-PPO-agent:v0 --num_episodes 100

where --num_episodes 100 signals that you would like to rollout the agent's policy for 100 episodes. The minimum, mean, and maximum episode reward will be logged back to W&B. If you would additionally like to render the episodes and have then logged back to W&B, you can provide the --render flag. But note that this will usually significantly slow down the evaluation process since environment rendering is not a pure JAX function and requires callbacks to the host. You should usually only use the --render flag with a small number of episodes.

Features

Pure JAX Implementation: Everything is implemented in JAX, enabling fast compilation and execution on CPUs, GPUs, and TPUs
Anakin Podracer Architecture: Follows the Anakin architecture for efficient parallelization across devices
Comprehensive Algorithm Suite: Includes both on-policy and off-policy algorithms for discrete and continuous control
Flexible Configuration: YAML-based configuration system for easy experiment management
Production Ready: Includes logging, checkpointing, and evaluation tools

Architecture

Dopamax follows the Anakin Podracer architecture, which enables:

Efficient vectorization across batches within each device
Parallelization across multiple devices (multi-GPU/TPU support)
Single XLA compilation of the entire training loop
Minimal host-device communication overhead

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
examples		examples
src/dopamax		src/dopamax
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dopamax

Supported Algorithms

Installation

Quick Start

Training

Evaluation

Features

Architecture

See Also

About

Uh oh!

Releases 3

Uh oh!

Languages

License

rystrauss/dopamax

Folders and files

Latest commit

History

Repository files navigation

Dopamax

Supported Algorithms

Installation

Quick Start

Training

Evaluation

Features

Architecture

See Also

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Languages