Skip to content

[NeurIPS 2023] 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

License

Notifications You must be signed in to change notification settings

RIPS25-Analog/3D-Copy-Paste

 
 

Repository files navigation

3D Copy-Paste

Overview

This repository provides a pipeline for 3D Copy-Paste data augmentation by inserting realistic 3D objects into the SUN RGB-D dataset. The method generates synthetic training data for 2D object detection by compositing 3D models into real indoor scenes with physically plausible placement and lighting.

Prerequisites

Repository Dependencies

Please fork the following repositories under our workspace:

Environment Setup

Software Used:

Python 3.10.12, MATLAB, and C++ is require for this project

Packages

please install the following packages

pip install numpy scipy matplotlib pillow

pip install torch torchvision
pip install tensorflow keras
pip install jax jaxlib

pip install opencv-python
pip install scikit-learn pandas h5py

pip install jupyter ipython
pip install flake8

pip install trimesh  # For advanced 3D mesh operations
pip install open3d   # Alternative 3D processing library

Quick Start

Note: For detailed explanations and insturctions of modifying the original 3D Copy-Paste implementation, please refer to original README.md.

1. Download SUN RGB-D Dataset

Go to SUNRGBD dataset and download: SUNRGBD.zip; SUNRGBDMeta2DBB_v2.mat; SUNRGBDMeta3DBB_v2.mat

2. Prepare 3D Models

Ensure your 3D models are in the supported format.

3d_models/
    |   ├── class_name_1
    |   |   ├── instance1
    |   |   |   ├── instance1.obj
    |   |   |   ├── instance1.mtl
    |   |   |   └── instance1.png
    |   |   ├── instance2 
    |   |   └── ...   
    |   ├── class_name_2
    |   ├── class_name_3
    |   └── ...

3. Pre-Data Generation Preparation

bash Preparation.sh

This step orgnaized the naccessary imformation from the background images and the 3d models

4. Generate Synthetic Data

python 3d_copy_paste.py \                          # Execute the Python script using standard Python interpreter
  --root_path "/path/to/your/sunrgbd/data" \       # Directory containing SUN RGB-D dataset (background images, camera calibration, plane data)
  --obj_root_path "/path/to/your/3d/models" \      # Directory containing 3D object models (.obj files) to insert into scenes
  --max_iter 5 \                                   # Generate 5 different variations for each scene in the dataset
  --num_objects_per_scene 6 \                      # Place exactly 6 objects in each generated scene
  --random_seed 12345 \                            # Set random seed for reproducible results across runs
  --ilog 3 \                                       # Environment mapping log parameter (affects lighting calculations)
  --istrength 4                                    # Environment lighting intensity multiplier (higher = brighter lighting)

Differences from Original 3D Copy-Paste

Enhanced Object Placement

  • 3D collision detection replaces 2D checks, eliminating floating objects
  • Multi-object scenes support simultaneous placement vs. single-object limitation
  • Six-sided rotation allows any face orientation vs. Z-axis only
  • Multi-surface placement extends beyond floor to tables, desks, and furniture

Improved Visibility Control

  • Automatic scaling ensures consistent object size across models
  • Real-time size measurement maintains minimum screen coverage
  • Camera projection mathematics prevents oversized or undersized objects

Physical Realism

  • Perfect ground contact through precise bottom-point calculation
  • Sequential obstacle detection for collision-free multi-object placement
  • Surface boundary compliance with up to 2000 placement attempts per object

The enhanced framework transforms single-object, floor-restricted placement into a comprehensive multi-surface, multi-object system with realistic physics and diverse orientations.

Extra Tools

The repository includes several utility tools for 3D model processing:

  • Resize 3D models: resize_3d_models.py
  • Format conversion:
    • GLB to OBJ: export_glb_2_obj.py
    • PLY to OBJ: export_ply_2_obj.py

Output Structure

Output directory structure

insertion_ilog2_istren2_context_[timestamp]/
├── annotated_images/        # Visual debugging images with bounding boxes
├── compositional_image/     # Final composite images for training
├── envmap/                  # Environment lighting maps
├── insert_object_log/       # Detailed insertion metadata (JSON)
├── inserted_foreground/     # Rendered 3D objects with transparency
├── label/                   # 3D object detection labels (JSON)
└── yolo_annotations/        # 2D YOLO format annotations (TXT)
Directory Description Format
compositional_image/ Primary training images with inserted 3D objects PNG
label/ 3D bounding box annotations JSON
yolo_annotations/ 2D bounding box annotations
class_id center_x center_y width height
TXT
insert_object_log/ Detailed metadata about object placement and scaling JSON

Citation

@inproceedings{3dcopy2024,
  title={3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection},
  author={[Authors]},
  booktitle={[Conference]},
  year={2024}
}

@misc{mmdet3d2020,
    title={{MMDetection3D: OpenMMLab} next-generation platform for general {3D} object detection},
    author={MMDetection3D Contributors},
    howpublished = {\url{https://github.com/open-mmlab/mmdetection3d}},
    year={2020}
}

@misc{lzqsd,
   title={{Inverse Rendering of Indoor Scene: RIPS 2025 Analog Devices Project}},
   author={zhengqinli},
   howpublished = {\url{https://github.com/lzqsd/InverseRenderingOfIndoorScene}},
   year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

[NeurIPS 2023] 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.9%
  • Shell 2.1%