A modular Python implementation of Visual SLAM (Simultaneous Localization and Mapping) focused on monocular visual odometry. This project efficiently processes TUM RGB-D benchmark datasets to estimate camera trajectory with robust feature tracking and scale estimation.
- Feature-based Visual Odometry: Uses ORB features and robust matching to track camera motion
- Automatic Scale Initialization: Computes scale factor from initial frames for proper trajectory scaling
- Comprehensive Visualization: 2D/3D trajectory plots and feature matching visualizations
- TUM Dataset Integration: Full support for TUM RGB-D benchmark datasets
- Robust Error Handling: Graceful handling of insufficient features and matching failures
- Configurable via YAML: Easily customize parameters through configuration files
Example of estimated trajectory (blue) compared with ground truth (red)
.
├── config/
│ └── config.yaml # Configuration file with system parameters
├── src/
│ ├── slam/ # Main SLAM package
│ │ ├── core/ # Core SLAM components (VO, camera, frame)
│ │ ├── utils/ # Utility functions (config, data handling)
│ │ └── visualization/ # Visualization tools
│ └── main.py # Main script for running the system
├── data/
│ ├── raw/ # Raw input data (TUM RGB-D datasets)
│ └── processed/ # Processed output data (trajectories, visualizations)
├── scripts/ # Utility scripts (evaluation, dataset preparation)
├── docs/ # Documentation
├── requirements.txt # Project dependencies
└── README.md # This file
- Clone the repository:
git clone https://github.com/yourusername/visual-slam-tum.git
cd visual-slam-tum- Create and activate a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txtThis system is designed to work with the TUM RGB-D benchmark datasets. To prepare the data:
- Download one of the TUM RGB-D datasets (e.g.,
rgbd_dataset_freiburg2_pioneer_360) - Extract the dataset to
data/raw/tum_rgbd/ - Update the dataset path in
config/config.yamlif needed
Example directory structure:
data/raw/tum_rgbd/rgbd_dataset_freiburg2_pioneer_360/
├── rgb/ # RGB images
├── depth/ # Depth images (not used in monocular mode)
├── groundtruth.txt # Ground truth trajectory
├── rgb.txt # RGB image timestamps
└── depth.txt # Depth image timestamps
python src/main.pyThis will:
- Load the dataset specified in the configuration
- Process frames sequentially to estimate camera motion
- Generate visualizations and trajectory files in the output directory
python evaluate_slam.py data/processed/DATASET_NAME/trajectories/txt/estimated_trajectory.txt data/processed/DATASET_NAME/trajectories/txt/groundtruth.txt --gt-has-timestamps --output-dir data/processed/DATASET_NAME/evaluationThis will:
- Align and compare the estimated trajectory with ground truth
- Calculate Absolute Pose Error (APE) and Relative Pose Error (RPE)
- Generate visualizations of trajectory comparison and error metrics
The system is configured through config/config.yaml. Key parameters include:
camera:
# Camera intrinsics
fx: 520.9 # Focal length x (for TUM Freiburg2)
fy: 521.0 # Focal length y
cx: 325.1 # Principal point x
cy: 249.7 # Principal point y
width: 640
height: 480
dataset:
path: "data/raw/tum_rgbd/rgbd_dataset_freiburg2_pioneer_360"
max_frames: 500 # Maximum number of frames to process
feature_detection:
n_features: 1000
scale_factor: 1.2
n_levels: 8
matching:
distance_threshold: 0.75
min_matches: 8
matcher: "flann" # Options: flann, bf
motion_estimation:
min_inliers: 15
ransac_threshold: 2.0
visualization:
save_matches_video: true
save_trajectory_2d: true
save_trajectory_3d: trueThe system generates several outputs in data/processed/DATASET_NAME/:
-
Trajectories:
trajectories/txt/estimated_trajectory.txt: Camera trajectory in TUM formattrajectories/txt/groundtruth.txt: Ground truth trajectorytrajectories/ply/pointcloud.ply: Sparse point cloud of camera positions
-
Visualizations:
visualizations/trajectory_comparison.png: Comparison of estimated and ground truth trajectoriesvisualizations/trajectory_2d.png: 2D top-down view of trajectoryvisualizations/feature_matches.avi: Video showing feature matches between frames
- Feature Detection: ORB features are detected in each frame
- Feature Matching: Features are matched between consecutive frames
- Pose Estimation: The essential matrix is computed and decomposed to recover camera motion
- Scale Initialization: Scale is initialized from the first few frames
- Trajectory Building: Camera poses are accumulated to build the complete trajectory
Monocular visual odometry inherently suffers from scale ambiguity. To address this:
- The system initializes the absolute scale using ground truth in the first few frames
- This scale factor is then applied to all subsequent motion estimations
- The median scale is computed to be robust against outliers
The system uses the following metrics for evaluation:
- Absolute Pose Error (APE): Measures the global error between estimated poses and ground truth
- Relative Pose Error (RPE): Measures local accuracy and drift
- Scale Factor: The estimated scale compared to ground truth
Contributions are welcome! To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
