Accompanying repository of Master's thesis at TU Berlin / Aalborg University. MIGRATED TO LATEST MUJOCO - Now compatible with Apple Silicon (M1/M2/M3/M4) MacBooks!
Deep Reinforcement Learning for robotic pick and place applications using purely visual observations
Orginal Author: Paul Daniel (paudan22@gmail.com)
Migration: Updated from mujoco-py to DeepMind MuJoCo + Gymnasium for modern compatibility
Traits of this environment: Very large and multi-discrete actionspace, very high sample-cost, visual observations, binary reward.
This repository has been completely migrated from the deprecated mujoco-py to the latest DeepMind MuJoCo library, making it fully compatible with:
- โ Apple Silicon MacBooks (M1, M2, M3, M4)
- โ Latest Python versions (3.8+)
- โ Modern Gymnasium API (replacing deprecated gym)
- โ Cross-platform compatibility (Windows, macOS, Linux)
- MuJoCo:
mujoco-pyโmujoco(DeepMind's official library) - Gym:
gymโgymnasium(modern Gym API) - Viewer: Updated to use MuJoCo's native viewer
- API: All MuJoCo API calls updated to latest syntax
- Dependencies: Updated all packages to latest versions
| Trained agent in action | Example of predicted grasp chances |
|---|---|
![]() |
![]() |
| Setup iteration | Relevant changes |
|---|---|
| IT5 | - Many more objects, randomly piled - Actionspace now multi-discrete, with second dimension being a rotation action |
| IT4 | - Z-coordinate for grasping now calculated using depth data - Objects now vary in size |
| IT3 | - New two-finger gripper implemented |
| IT2 | - Grasp success check now after moving to drop location (1000 steps) |
| IT1 (Baseline) | - Grasp success check after moving straight up (500 steps of trying to close the gripper) - Fixed z-coordinate for grasping - Objects of equal size |
This repository provides several python classes for control of robotic arms in MuJoCo:
-
MJ_Controller: This class can be used as a standalone class for basic robot control in MuJoCo. This can be useful for trying out models and their grasping capabilities. Alternatively, its methods can also be used by any other class (like a Gym environment) to provide some more functionality. One example of this might be to move the robot back into a certain position after every episode of training, which might be preferable compared to just resetting all the joint angles and velocities. The controller currently also holds the methods for image transformations, which might be put into another separate class at some point.
-
GraspEnv: A Gym environment for training reinforcement learning agents. The task to master is a pick & place task. The difference to most other MuJoCo Gym environments is that the observation returned is a camera image instead of a state vector of the simulation. This is meant to resemble a real world setup more closely.
The robot configuration used in this setup (Universal Robots UR5 + Robotiq S Model 3 Finger Gripper) is based on this resource. It has since been heavily modified. Most current XML-file: UR5gripper_2_finger.xml
Updated Dependencies:
- MuJoCo: Now uses DeepMind MuJoCo (official library)
- Gymnasium: Modern replacement for OpenAI Gym
- PID controllers: Based on simple_pid
- Inverse kinematics: Using ikpy
- Image processing: OpenCV for camera observations
- Visualization: Matplotlib for plotting
# Clone the repository
git clone https://github.com/PaulDanielML/MuJoCo_RL_UR5.git
cd MuJoCo_RL_UR5/
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install the package in development mode
pip install -e .# Clone the repository
git clone https://github.com/qdo1010/MuJoCo_RL_UR5.git
cd MuJoCo_RL_UR5/
# Install dependencies
pip install -r requirements.txt
# Install the package
pip install -e .- Python: 3.8 or higher
- Operating System: Windows, macOS (Intel & Apple Silicon), Linux
- Memory: At least 4GB RAM recommended
- Graphics: OpenGL support for 3D rendering
The following packages are automatically installed via requirements.txt:
mujoco- DeepMind's official MuJoCo librarygymnasium- Modern Gym APInumpy- Numerical computingopencv-python- Computer visionmatplotlib- Plotting and visualizationsimple-pid- PID controllersikpy- Inverse kinematicspyquaternion- Quaternion operationstermcolor- Colored terminal output
On macOS, you need to use mjpython instead of python to run scripts with the 3D viewer:
# For scripts with 3D viewer
mjpython example_3d_only.py
# For scripts without 3D viewer
python example_agent.pyTest your installation:
# Test basic functionality
python example_agent.py
# Test 3D viewer (macOS)
mjpython example_3d_only.py
# Test camera observations
python example_camera_only.pyThe repository now includes multiple example scripts to demonstrate different features:
example_agent.py- Basic random agent with camera observationsexample.py- Simple controller demonstrationsimple_3d_viewer.py- Minimal 3D viewer example
example_3d_only.py- 3D MuJoCo viewer only (recommended for macOS)example_3d_gymnasium.py- Gymnasium environment with 3D viewerexample_3d_with_saved_camera.py- 3D viewer + camera images saved to filesexample_3d_with_matplotlib.py- 3D viewer + matplotlib camera displayexample_3d_with_threaded_camera.py- 3D viewer + threaded OpenCV camera display
example_camera_only.py- Camera observations only (no 3D viewer)example_agent_with_3d.py- Agent with both 3D viewer and camera
Grasping_Agent_multidiscrete.py- Multi-discrete action space agentnormalize.py- Image normalization utility
Offline RL/generate_data.py- Generate training dataOffline RL/train.py- Train offline RL agentsOffline RL/grasping_dataset.py- Dataset utilities
Gymnasium environment for training agents to use RGB-D data for predicting pixel-wise grasp success chances.
The file example_agent.py demonstrates the use of a random agent for this environment.
The file Grasping_Agent_multidiscrete.py gives an example of training a shortsighted DQN-agent in the environment to predict pixel-wise grasping success (PyTorch).
The created environment has an associated controller object, which provides all the functionality of the MJ_Controller - class to it.
- Action space: Pixel space, can be specified by setting height and width. Current defaults: 200x200. This means there are 40.000 possible actions. This resolution translates to a picking accuracy of ~ 4mm.
- State space: The states / observations provided are dictionaries containing two arrays: An RGB-image and a depth-image, both of the same resolution as the action space
- Reward function: The environment has been updated to a binary reward structure:
- 0 for choosing a pixel that does not lead to a successful grasp.
- +1 for choosing a pixel that leads to a successful grasp.
The user gets a summary of each step performed in the console. It is recommended to train agents without rendering, as this will speed up training significantly.
The rgb part of the last captured observation will be shown and updated in an extra window.
-
"Unknown C++ exception from OpenCV code" on macOS
- Solution: Use separate scripts for 3D viewer and camera observations
- 3D Viewer:
mjpython example_3d_only.py - Camera Only:
python example_camera_only.py - Both (saved):
mjpython example_3d_with_saved_camera.py
-
"launch_passive requires mjpython on macOS"
- Solution: Use
mjpythoninstead ofpythonfor scripts with 3D viewer - Example:
mjpython example_3d_only.py
- Solution: Use
-
"ModuleNotFoundError: No module named 'mujoco'"
- Solution: Install dependencies:
pip install -r requirements.txt
- Solution: Install dependencies:
-
"ModuleNotFoundError: No module named 'gymnasium'"
- Solution: Install gymnasium:
pip install gymnasium
- Solution: Install gymnasium:
-
Performance Issues
- Solution: Disable rendering during training:
render=False - Solution: Use
show_obs=Falseto disable camera windows
- Solution: Disable rendering during training:
- macOS: Use
mjpythonfor 3D viewer,pythonfor camera-only scripts - Windows/Linux: Use
pythonfor all scripts - Apple Silicon: Fully supported with the new MuJoCo library
Example usage of some of the class methods is demonstrated in the file example.py.
The class MJ_Controller offers high and low level methods for controlling the robot in MuJoCo.
- move_ee : High level, moves the endeffector of the arm to the desired XYZ position (in world coordinates). This is done using very simple inverse kinematics, just obeying the joint limits. Currently there is not collision avoidance implemented. Since this whole repo is created with grasping in mind, the delivered pose will always be so that the gripper is oriented in a vertical way (for top down grasps).
- actuate_joint_group : Low level, lets the user specify motor activations for a specified group
- grasp : Uses the specified gripper group to attempt a grasp. A simple check is done to determine weather the grasp was successful or not and the result will be output blinking in the console.
mujoco-pyโmujoco: Complete migration to DeepMind's official MuJoCo librarygymโgymnasium: Updated to modern Gym API with proper environment wrapping- API Updates: All MuJoCo API calls updated to latest syntax
gym_grasper/controller/MujocoController.py: Complete refactor for new MuJoCo APIgym_grasper/envs/GraspingEnv.py: Migrated fromgym.envs.mujocotogymnasium.Envrequirements.txt: Updated all dependencies to latest versionssetup.py: Updated package configuration for modern Python
- Multiple 3D Viewer Examples: Various approaches to show 3D simulation
- Camera Observation Examples: Separate scripts for camera-only viewing
- Cross-Platform Compatibility: Works on Windows, macOS (Intel & Apple Silicon), Linux
- Modern Environment API: Proper Gymnasium environment with
reset()andstep()methods
- Environment Creation: Now uses
gymnasium.make()instead ofgym.make() - Reset Method: Returns
(observation, info)tuple instead of just observation - Step Method: Returns 5 values
(obs, reward, terminated, truncated, info)instead of 4 - Viewer: Uses MuJoCo's native viewer instead of mujoco-py's viewer
๐ Major Migration (2024): Complete migration from mujoco-py to DeepMind MuJoCo for Apple Silicon compatibility
Trials for Offline RL: The folder Offline RL contains scripts for generating and learning from a dataset of (state, action, reward)-transitions. generate_data.py can be used to generate as many files as required, each file containing 12 transitions.
New gripper model available: A new, less bulky, 2-finger gripper was implemented in the model in training setup iteration 3.
Image normalization: Added script normalize.py, which samples 100 images from the environment and writes the mean values and standard deviations of all channels to a file.
Reset shuffle: Calling the environments step method now rearranges all the pickable objects to random positions on the table.
Record grasps: The step method of the GraspingEnv now has the optional parameter record_grasps. If set to True, it will capture a side camera image every time a grasp is made that is deemed successful by the environment. This allows for "quality control" of the grasps, without having to watch all the failed attempts. The captured images can also be useful for fine tuning grasping parameters.
Point clouds: The controller class was provided with new methods for image transformations.
- depth_2_meters: Converts the normalized depth values returned by MuJoCo into meters.
- create_camera_data: Constructs a camera matrix, focal length and sets the camera's position and rotation based on a provided camera name and desired image width and depth.
- world_2_pixel: Accepts a XYZ world position and returns the corresponding x-y pixel coordinates
- pixel_2_world: Accepts x-y pixel coordinates and a depth value, returns the XYZ world position. This method can be used to construct point clouds out of the data returned by the controllers get_image_data method.
Joint plots: All methods that move joints now have an optional plot parameter. If set to True, a .png-file will be created in the local directory. It will show plots for each joint involved in the trajectory, containing the joint angles over time, as well as the target values. This can be used to determine which joints overshoot, oscillate etc. and adjust the controller gains based on that.
The tolerance used for the trajectory are plotted in red, so it can easily be determined how many steps each of the joints needs to reach a value within tolerance.
This repository was migrated to support modern hardware and software. If you encounter issues or have improvements:
- Check the troubleshooting section above
- Test with the provided example scripts
- Report issues with your system specifications
- Contribute fixes for cross-platform compatibility
This project maintains the original license. See license.txt for details.
- Original Author: Paul Daniel (paudan22@gmail.com)
- Migration: Updated for modern MuJoCo and Apple Silicon compatibility
- Robot Model: Based on Universal Robots UR5 + Robotiq gripper from MuJoCo forums
- Dependencies: Thanks to all the open-source libraries that make this possible









