Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
**TL;DR**: Ctrl-Room is a generative framework to synthesize 3D Indoor Scene from text prompts. It includes two stages: the first stage create the 3D scene layout from a simple textual description of the room, the second stage generate a RGB panorama that is well-aligned to the 3D scene layout.
Feel free to contact me (cfangac@connect.ust.hk) or open an issue if you have any questions or suggestions.
- 2025-09-23: Inference instructions are provided.
- 2025-09-18: The source code and pretrained models are released.
- Release source code and pretrained models.
- Release the dataset we use - Structured3D with accurate bounding box annotation.
- Provide detailed inference instructions for panorama generation.
- Provide detailed inference instructions for panorama-reconstruction.
- Provide training instructions.
Tested with the following environment:
- Python 3.10
- PyTorch 2.3.1
- CUDA Version 11.8
git clone https://github.com/fangchuan/Ctrl-Room.git
cd Ctrl-Room
To save you from the complex C++ libs dependencies between panorama_reconstruction modules, We strongly recommend using settings/Dockerfile to set up the environment. You can build the docker image by the following command:
docker build -t ctrlroom:latest -f settings/Dockerfile .
# run the docker image
docker run -it --gpus all --name ctrlroom-test -v /path/to/your/data_and_code:/path/to/your/data_and_code ctrlroom:latest /bin/bash
# [deprecated] conda env setup on your local machine
conda create -n ctrlroom python=3.10 -y
conda activate ctrlroom
pip install -r settings/requirements.txt- This project propose the CtrlRoom Dataset, which provides accurate 3D layout annotations for 12,615 rooms. The dataset includes 5,064 bedrooms, 3,064 living rooms, 2,289 kitchens, 698 studies, and 1,500 bathrooms. In total, it contains nearly 150,000 accurately oriented 3D bounding boxes across 25 object categories, with annotations meticulously completed by a team of three annotators over 1,200 hours.
All pretrained models are available at HuggingFace🤗.
| Model Name | Fine-tined From | #Param. | Link | Note |
|---|---|---|---|---|
| bedroom_layout_gen | From scratch | 63M | st3d_layout_bedroom | Text-to-Bedroom-Layout |
| study_layout_gen | From scratch | 63M | st3d_layout_study | Text-to-Study-Layout |
| livingroom_layout_gen | From scratch | 63M | st3d_layout_livingroom | Text-to-Livingroom-Layout |
| kitchen_layout_gen | From scratch | 63M | st3d_layout_kitchen | Text-to-Kitchen-Layout |
| bedroom_panorama_gen | From ControlNet-SD1.5 | 1220M | st3d_panorama_bedroom | 3D Layout-to-Panorama |
| study_panorama_gen | From ControlNet-SD1.5 | 1220M | st3d_panorama_study | 3D Layout-to-Panorama |
| livingroom_panorama_gen | From ControlNet-SD1.5 | 1220M | st3d_panorama_livingroom | 3D Layout-to-Panorama |
| kitchen_panorama_gen | From ControlNet-SD1.5 | 1220M | st3d_panorama_kitchen | 3D Layout-to-Panorama |
# download above pretrained weights into the ckpts/ folder
# layout sampling of bedroom, living room, study ...
bash scripts/run_st3d_room_layout_sample.sh /path/to/your/ctrlroom_dataset /output_layout_samples
# Text-to-Layout-to-3D room meshes generation
bash scripts/run_text2bedroom_pipeline.sh /path/to/your/ctrlroom_dataset /output_samples
bash scripts/run_text2livingroom_pipeline.sh /path/to/your/ctrlroom_dataset /output_samples
bash scripts/run_text2study_pipeline.sh /path/to/your/ctrlroom_dataset /output_samples
bash scripts/run_text2kitchen_pipeline.sh /path/to/your/ctrlroom_dataset /output_samples
it will gives you the 3D room generations following below architecture:
bedroom:- text_prompt.txt: the text prompt used for layout generation
- samples_1x23x31.npz: sampled Scene Code (3D layout) in numpy format
0/: the first bedroom sample- scene_xxxx_xxxx_sem.png: the layout_semantic panorama image
- scene_xxxx_xxxx_pano.png: the RGB panorama image generation aligned to the layout
- scene_xxxx_xxxx.ply: the reconstructed 3D layout mesh in ply format
- model.obj: the textured 3D room mesh in obj format
# Optional: if you find that the training script hangs over vvvvvery long time once running the train script, you might need to uninstall mip4py and install liibopenmpi-dev
pip uninstall mpi4py
sudo apt install libopenmpi-dev -y
bash scripts/run_st3d_room_layout_train.sh /path/to/your/ctrlroom_dataset /log_layout_training We would like to thank the authors of DiffuScene, ATISS, and Sceneformer for their great work and generously providing source codes, which inspired our work and helped us a lot in the implementation.
If you find our work helpful, please consider citing:
@article{Ctrl-Room,
title = {Ctrl-room: Controllable text-to-3d room meshes generation with layout constraints},
author = {Chuan Fang, Yuan Dong, Kunming Luo, Xiaotao Hu, Rakesh Shrestha, Ping Tan},
journal = {2025 International Conference on 3D Vision (3DV).},
year = {2025},
eprint = {2025: 692-701.},
archivePrefix = {IEEE},
primaryClass = {cs.CV}
}