panowan.mp4
Official repository for "PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms"
We use uv to manage Python environment.
# First, install uv
./scripts/install-uv.sh
# Then, create virtual environment for PanoWan
uv syncNote that you may need to change the wheel url for flash-attn to match your platform.
The lora checkpoint is released at HuggingFace. You can use the following command to download Wan2.1 and PanoWan models.
# Download Wan2.1-T2V-1.3B
./scripts/download-wan.sh ./models/Wan-AI/Wan2.1-T2V-1.3B
# Download PanoWan
./scripts/download-panowan.sh ./models/PanoWanUse the following command for inference:
uv run panowan-test \
--wan-model-path ./models/Wan-AI/Wan2.1-T2V-1.3B \
--lora-checkpoint-path ./models/PanoWan/latest-lora.ckpt \
--output-path ./outputs/video.mp4Detailed usage can be found via:
uv run panowan-test --helpWe make our PanoVid dataset publicly available on HuggingFace, providing comprehensive metadata and captions to facilitate future research.
The heading lines of metadata-train-val.csv are corresponding to the YouTube subset, whose video files can be directly downloaded from YouTube.
The file names contains the YouTube video ID and start/end timestamps for video clipping.
As for other lines, please download video files from 3601M, 360+x, Imagine360, WEB360, Panonut360, Miraikan 360-degree Video Dataset, etc.
We will release more detailed instruction for these subsets later.
Generate panoramic videos from text prompts:
Canyon.mp4 |
concert.festival.mp4 |
cyberpunk.mp4 |
desert.mp4 |
hot.pot.restaurant.mp4 |
lake.mp4 |
ski.resort.mp4 |
volcano.mp4 |
Generate extended panoramic videos using temporal windowing and seamless blending:
beach.mp4
Enhance low-resolution panoramic videos to 2x resolution:
artisan.bakery.mp4Low Resolution |
artisan.bakery.2x.pano.mp4High Resolution |
Edit panoramic videos with text-guided modifications:
inpainting_original.mp4Original |
inpainted.mp4Edited |
Transform conventional videos to panoramic format:
outpainting.mov
- Support training.
- Support inference.
- Release pretrained model.
- Release dataset.
@inproceedings{xia2025panowan,
title = {PanoWan: Lifting Diffusion Video Generation Models to 360° with Latitude/Longitude-aware Mechanisms},
author = {Xia, Yifei and Weng, Shuchen and Yang, Siqi and Liu, Jingqi and Zhu, Chengxuan and Teng, Minggui and Jia, Zijian and Jiang, Han and Shi, Boxin},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025}
}