🌐 Project Page • 📖 Overview • 📦 Installation • 💻 Usage • 📝 Citation
This repository contains the official implementation of D3: Divide, Discover, Deploy, presented at CoRL 2025. D3 is a framework for learning diverse and reusable robotic skills through factorized unsupervised skill discovery with symmetry and style priors.
- 🔀 Factorized USD Algorithms: Modular implementation supporting DIAYN, METRA, and extensible to custom algorithms
- 🤖 IsaacLab Integration: High-performance simulation environments for quadrupedal robots
- 📊 Hierarchical Skill Learning: Support for both low-level and high-level skill discovery
- 🎯 Downstream Task Evaluation: Pre-configured environments for goal tracking, pedipulation, and velocity tracking
- 📖 Overview
- 📦 Installation
- 🐳 Docker Installation (Alternative)
- 🏗️ Repository Structure
- 🌍 Environments
- 💻 Usage
- 🧠 Algorithm Details
- 📊 Performance & Training Tips
- 🐛 Troubleshooting
- 📝 Citation
- ⚖️ License
📋 Prerequisites
| Requirement | Version/Details |
|---|---|
| Operating System | Linux (tested on Ubuntu 20.04+) |
| Python | 3.10+ |
| CUDA | 11.8+ (for GPU acceleration) |
| GPU Memory | 16GB+ VRAM recommended |
| Disk Space | ~50GB for Isaac Sim + dependencies |
| Isaac Sim | 4.5.0+ (included with Isaac Lab 2.2) |
Follow the official Isaac Lab installation guide to install Isaac Lab 2.2.
From your Isaac Lab installation directory:
./isaaclab.sh --conda d3_env
conda activate d3_env./isaaclab.sh --installClone this repository and install:
git clone https://github.com/leggedrobotics/d3-skill-discovery.git
cd d3-skill-discovery
./install.shWhat does install.sh do?
The installation script will:
- ✅ Install the
d3_rsl_rlpackage with USD algorithms - ✅ Register the
d3_skill_discoveryextension with Isaac Lab - ✅ Set up all Python dependencies
- ✅ Verify the installation
Using Docker for Easy Setup
Docker provides an isolated environment with all dependencies pre-installed, making it easier to get started without manual setup.
| Requirement | Installation Guide |
|---|---|
| Docker (20.10+) | Install Docker |
| Docker Compose (2.0+) | Install Docker Compose |
| NVIDIA Container Toolkit | Install NVIDIA Docker |
| GPU Memory | 16GB+ VRAM recommended |
💡 Tip: Verify GPU access with:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
-
Clone the repository
git clone https://github.com/leggedrobotics/d3-skill-discovery.git cd d3-skill-discovery -
Build the Docker image (this may take 15-20 minutes)
docker compose -f docker/docker-compose.yaml build
# Start the container with an interactive bash shell
docker compose -f docker/docker-compose.yaml run d3-skill-discovery
# You'll be inside the container at /workspace/d3-skill-discovery
# All packages are already installed and ready to useInside the container, you can run training commands directly:
# Train in headless mode (recommended for Docker)
python scripts/d3_rsl_rl/train.py \
--task Isaac-USD-Anymal-D-v0 \
--num_envs 2048 \
--headless \
--logger tensorboard
# For WandB logging, set your API key first
export WANDB_API_KEY=your_api_key_here
python scripts/d3_rsl_rl/train.py \
--task Isaac-USD-Anymal-D-v0 \
--logger wandbPersistent Data:
- The workspace is mounted from your host machine, so all changes persist
- Logs and checkpoints saved in the container are accessible on your host at
d3-skill-discovery/logs/
GPU Usage:
- Verify GPU access with
nvidia-smiinside the container - If you see GPU errors, ensure nvidia-docker2 is properly installed
Multiple Terminals:
-
To open additional terminals in the same container:
docker exec -it d3-skill-discovery bash
Stopping the Container:
- Type
exitor pressCtrl+Dto exit the container - Container state is not saved; restart with
docker compose -f docker/docker-compose.yaml run d3-skill-discovery
Cleaning Up:
- Remove the container:
docker compose -f docker/docker-compose.yaml down - Remove the image:
docker rmi d3-skill-discovery - Full cleanup:
docker system prune -a
Click to expand directory tree
d3-skill-discovery/
├── source/d3_skill_discovery/ # IsaacLab extension with environments
│ └── d3_skill_discovery/
│ ├── tasks/ # Environment implementations
│ │ ├── unsupervised_skill_discovery/ # USD environments
│ │ └── downstream/ # Evaluation tasks
│ └── d3_rsl_rl/ # Configuration utilities
├── source/d3_rsl_rl/ # Reinforcement learning algorithms
│ └── d3_rsl_rl/
│ ├── algorithms/ # PPO implementation
│ ├── intrinsic_motivation/ # USD algorithms (DIAYN, METRA, etc.)
│ ├── modules/ # Neural network architectures
│ ├── runners/ # Training orchestration
│ └── storage/ # Rollout buffer management
└── scripts/ # Training and evaluation scripts
└── d3_rsl_rl/
├── train.py # Main training script
├── play.py # Policy visualization
└── skill_gui.py # Interactive skill control GUI
The tasks are implemented inside source/d3_skill_discovery/d3_skill_discovery/tasks directory.
ANYmal-D environments for learning diverse skills without task-specific rewards:
| Environment | Description | Task ID | Config File |
|---|---|---|---|
| 🦿 Low-Level USD | Basic skill learning on rough terrain (as described in paper) | Isaac-USD-Anymal-D-v0 |
anymal_usd_env_cfg.py |
| 🎯 High-Level USD | Hierarchical skill learning (requires pretrained low-level policy) | Isaac-HL-USD-Anymal-D-v0 |
anymal_hl_usd_env_cfg.py |
| 📦 USD with Box | Skill learning with interactive movable box for manipulation | Isaac-HL-USD-Box-Anymal-D-v0 |
anymal_hl_usd_box_env_cfg.py |
💡 Which environment should I start with?
For reproducing paper results, start with Low-Level USD (Isaac-USD-Anymal-D-v0). Once you have a trained low-level policy, you can proceed to high-level skill learning.
Evaluation environments for testing learned skills on goal-directed tasks:
| Task Category | Description | Directory |
|---|---|---|
| 🎯 Goal Tracking | Goal-reaching navigation on rough terrain | goal_tracking/ |
| 🦾 Pedipulation | Precise foot positioning and object manipulation | pedipulation/ |
| 🏃 Velocity Tracking | Velocity tracking and locomotion control | velocity_tracking/ |
Train a low-level skill discovery model on ANYmal-D using scripts/d3_rsl_rl/train.py:
python scripts/d3_rsl_rl/train.py \
--task Isaac-USD-Anymal-D-v0 \
--num_envs 2048 \
--headless \
--logger wandb \
--run_name my_experiment⚙️ Command Line Arguments
| Argument | Description | Default |
|---|---|---|
--task |
Environment task ID (see Environments) | Required |
--num_envs |
Number of parallel simulation environments | 2048 |
--headless |
Run without GUI for faster training | False |
--logger |
Logging backend: wandb or tensorboard |
tensorboard |
--run_name |
Experiment name for logging | Auto-generated |
--max_iterations |
Maximum training iterations | 10000 |
--device |
Compute device: cuda or cpu |
cuda |
See all available arguments: scripts/d3_rsl_rl/cli_args.py
💡 Tip: For fastest training, use
--headlessmode and increase--num_envsbased on your GPU memory (e.g., 4096 for 24GB+ VRAM).
For hierarchical skill learning, first train a low-level policy, then:
python scripts/d3_rsl_rl/train.py \
--task Isaac-HL-USD-Anymal-D-v0 \
--num_envs 2048 \
--headless \
--logger wandb \
--load_run path/to/low_level/checkpoint
⚠️ Important: High-level training requires a pretrained low-level policy. Use--load_runto specify the checkpoint directory.
Evaluate learned skills on downstream tasks using scripts/d3_rsl_rl/play.py:
python scripts/d3_rsl_rl/play.py \
--task Isaac-Goal-Tracking-Anymal-D-v0 \
--num_envs 64 \
--load_run path/to/trained/checkpoint🎮 Available Evaluation Tasks
# Goal tracking on rough terrain
python scripts/d3_rsl_rl/play.py --task Isaac-Goal-Tracking-Anymal-D-v0 --load_run <checkpoint>
# Foot positioning for manipulation
python scripts/d3_rsl_rl/play.py --task Isaac-Foot-Tracking-Anymal-D-v0 --load_run <checkpoint>
# Velocity tracking locomotion
python scripts/d3_rsl_rl/play.py --task Isaac-Velocity-Tracking-Anymal-D-v0 --load_run <checkpoint>Launch the skill GUI to manually control and visualize learned skills using scripts/d3_rsl_rl/skill_gui.py:
python scripts/d3_rsl_rl/skill_gui.py \
--checkpoint path/to/trained/modelRun hyperparameter optimization sweeps using WandB:
Running WandB Sweeps
Edit scripts/d3_rsl_rl/sweep/sweep.yaml to define parameters to optimize:
program: scripts/d3_rsl_rl/train.py
method: bayes
metric:
name: train/episode_reward
goal: maximize
parameters:
learning_rate:
min: 1e-4
max: 1e-2
num_envs:
values: [1024, 2048, 4096]Run scripts/d3_rsl_rl/sweep/initialize_sweep.py:
python scripts/d3_rsl_rl/sweep/initialize_sweep.py --project_name my_sweepThis writes the sweep ID to scripts/d3_rsl_rl/sweep/sweep_ids.json.
Run scripts/d3_rsl_rl/sweep/sweep.py:
# Run on single machine
python scripts/d3_rsl_rl/sweep/sweep.py --project_name my_sweep
# Run on multiple machines (same sweep_id)
python scripts/d3_rsl_rl/sweep/sweep.py --project_name my_sweep💡 Tip: You can run multiple agents in parallel on different machines to speed up the sweep.
To run sweeps on a cluster with Isaac Sim, you need to configure the sweep before initializing it. Update your sweep.yaml to use the Isaac Sim Python interpreter:
command:
- /isaac-sim/python.sh
- ${program}
# rest
⚠️ Important: This configuration must be set before runninginitialize_sweep.py. Once initialized, you cannot run the same sweep on both cluster and local machines due to different Python interpreters.
Track your training progress using built-in logging:
WandB (Recommended)
# Login to WandB
wandb login
# Train with WandB logging
python scripts/d3_rsl_rl/train.py \
--task Isaac-USD-Anymal-D-v0 \
--logger wandb \
--wandb_project d3-skill-discoveryLogged Metrics:
- Episode rewards and lengths
- Policy/value loss
- Discriminator accuracy (USD algorithms)
- Skill diversity metrics
- Learning rates
TensorBoard
# Train with TensorBoard logging (default)
python scripts/d3_rsl_rl/train.py \
--task Isaac-USD-Anymal-D-v0 \
--logger tensorboard
# View logs
tensorboard --logdir logs/The framework builds upon rsl_rl (v2.2.0) and uses the OnPolicyRunnerUSD for training. Currently implemented USD algorithms:
| Algorithm | Description | Implementation |
|---|---|---|
| DIAYN | Diversity is All You Need | diayn.py |
| METRA | Meta-Reinforcement Learning with Task Abstraction | metra.py |
Base RL Algorithm:
- PPO: Proximal Policy Optimization -
ppo.py
Neural Network Modules:
- Actor-Critic: Standard policy network -
actor_critic.py - Recurrent AC: LSTM-based policy -
actor_critic_recurrent.py - More architectures available in
d3_rsl_rl/d3_rsl_rl/modules/
The FACTOR_USD class manages multiple USD algorithms simultaneously, enabling factorized skill discovery across different observation spaces.
🔬 Research Note: Factorized USD allows decomposing skill learning into multiple factors (e.g., gait style, navigation behavior), each learned by a separate USD algorithm.
🛠️ Adding Custom USD Algorithms
To integrate a new unsupervised skill discovery algorithm, follow these steps:
Subclass BaseSkillDiscovery in d3_rsl_rl/intrinsic_motivation/:
import torch
from d3_rsl_rl.intrinsic_motivation.base_skill_discovery import BaseSkillDiscovery
class MyUSDAlgorithm(BaseSkillDiscovery):
def reward(self, usd_observations, skill: torch.Tensor, **kwargs) -> torch.Tensor:
"""Calculate the intrinsic reward for the underlying RL algorithm."""
# Your reward computation logic
pass
def sample_skill(self, envs_to_sample: torch.Tensor, **kwargs) -> torch.Tensor:
"""Sample a skill z."""
# Your skill sampling logic
pass
def update(self, observation_batch, **kwargs) -> dict:
"""Update the intrinsic motivation algorithm (e.g., train discriminator)."""
# Your update logic (e.g., discriminator training)
return {"loss": loss_value}
def get_save_dict(self) -> dict:
"""Return state dict for saving."""
return {"model_state": self.model.state_dict()}
def load(self, state_dict: dict, **kwargs) -> None:
"""Load the algorithm state."""
self.model.load_state_dict(state_dict["model_state"])
@property
def performance_metric(self) -> float:
"""Return performance metric between 0 and 1."""
# Your performance metric (e.g., discriminator accuracy)
return 0.5Add a config class to source/d3_skill_discovery/d3_skill_discovery/d3_rsl_rl/rl_cfg.py:
@configclass
class MyUSDAlgorithmCfg:
"""Configuration for MyUSDAlgorithm."""
learning_rate: float = 3e-4
# Your config parametersExtend the factory class in d3_rsl_rl/intrinsic_motivation/factoized_unsupervised_skill_discovery.py to initialize your algorithm.
Update the environment's USD configuration:
# In your environment config file
factors: dict[str, tuple[list[str], Literal["metra", "diayn", "my_algorithm"]]]
skill_dims: dict[str, int]
resampling_intervals: dict[str, int]
usd_alg_extra_cfg: dict[str, dict]Expected Training Performance
| Configuration | Recommended Specs |
|---|---|
| GPU | NVIDIA RTX 3090 / 4090 or better |
| VRAM | 16GB+ (24GB for 4096 envs) |
| CPU | Modern multi-core processor |
| RAM | 32GB+ |
| Task | Envs | Iterations | Time (RTX 4090) |
|---|---|---|---|
| Low-Level USD | 2048 | 10,000 | ~3-5 hours |
| High-Level USD | 2048 | 5,000 | ~2-3 hours |
| Downstream Tasks | 2048 | 5,000 | ~2-3 hours |
🎯 Hyperparameter Tuning Tips
Most hyperparameters can be adjusted in the configuration files:
Environment Configs:
Agent Configs:
- USD Algorithm:
rsl_rl_usd_cfg.py - PPO Hyperparameters:
rsl_rl_ppo_cfg.py
Common adjustments:
# PPO hyperparameters
num_learning_epochs: int = 5 # Number of epochs per iteration
num_mini_batches: int = 4 # Mini-batches per epoch
learning_rate: float = 1e-3 # Actor-critic learning rate
# USD-specific
skill_dim: int = 8 # Dimension of skill space
resampling_interval: int = 1000 # Steps before resampling skills- Start with default parameters from the paper
- Increase num_envs if you have more GPU memory
- Adjust learning_rate if training is unstable
- Monitor discriminator accuracy in WandB/TensorBoard
- Use curriculum learning for complex terrains
Common Issues and Solutions
Problem: ImportError: No module named 'isaaclab'
Solution: Ensure Isaac Lab is properly installed and the conda environment is activated:
conda activate d3_env
python -c "import isaaclab; print(isaaclab.__version__)"Problem: CUDA out of memory during training
Solution: Reduce the number of parallel environments:
python scripts/d3_rsl_rl/train.py --task Isaac-USD-Anymal-D-v0 --num_envs 1024 # or lowerProblem: Training is very slow on my GPU
Solution:
- Use
--headlessmode to disable rendering - Ensure you're using CUDA: check
nvidia-smishows GPU usage - Close other GPU-intensive applications
Problem: WandB login required
Solution: Either login to WandB or use a different logger:
wandb login # Enter your API key
# OR use tensorboard instead
python scripts/d3_rsl_rl/train.py --logger tensorboardIf you use this code in your research, please cite our paper:
@inproceedings{cathomen2025d3,
author = {Cathomen, Rafael and Mittal, Mayank and Vlastelica, Marin and Hutter, Marco},
title = {Divide, Discover, Deploy: Factorized Skill Learning with Symmetry and Style Priors},
booktitle = {Conference on Robot Learning (CoRL)},
year = {2025},
}This project is licensed under the BSD-3-Clause License. See LICENSE for details.
Third-Party Licenses
This project incorporates code from the following open-source projects:
| Project | License | Details |
|---|---|---|
| Isaac Lab | BSD-3-Clause | View License |
| rsl_rl | BSD-3-Clause | View License |
