Skip to content

Official repository for the paper "Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention"

License

Notifications You must be signed in to change notification settings

chengtao-lv/LightForcing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Light Forcing

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang

NTU, HKUST, Sensetime (LightX2V Group)

intro.mp4
(Results on Self Forcing 1.3B. Left: Dense Attention. Right: 1.3x acceleration using Light Forcing)

💡 Why Light Forcing

  • 🥇 Pioneer work: The first to explore sparse attention acceleration for autoregressive video generation.
  • 🏆 Superior performance: Achieves a VBench total score of 84.5, delivering high-quality results with strong overall performance.
  • 🚀 Significant acceleration: Provides over 3× Attention speedup and 1.2–1.3× end-to-end speedup. up to 2.3× end-to-end acceleration when combined with FP8 and LightVAE (19.7 FPS on a single RTX 5090 GPU).

🧾 Introduction

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., $1.2{\sim}1.3\times$ end-to-end speedup). Combined with FP8 quantization and LightVAE, Light Forcing further achieves a $2.3\times$ speedup and 19.7 FPS on an RTX 5090 GPU.

🧾 Results

🤝 Acknowledgments

We develop our code referring to the following repos:

✏️ Citation

If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work. We are currently organizing the code, and it will be open-sourced upon the paper is accepted.

About

Official repository for the paper "Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published