Light Forcing

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang

NTU, HKUST, Sensetime (LightX2V Group)

intro.mp4

(Results on Self Forcing 1.3B. Left: Dense Attention. Right: 1.3x acceleration using Light Forcing)

💡 Why Light Forcing

🥇 Pioneer work: The first to explore sparse attention acceleration for autoregressive video generation.
🏆 Superior performance: Achieves a VBench total score of 84.5, delivering high-quality results with strong overall performance.
🚀 Significant acceleration: Provides over 3× Attention speedup and 1.2–1.3× end-to-end speedup. up to 2.3× end-to-end acceleration when combined with FP8 and LightVAE (19.7 FPS on a single RTX 5090 GPU).

🧾 Introduction

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., $1.2{\sim}1.3\times$ end-to-end speedup). Combined with FP8 quantization and LightVAE, Light Forcing further achieves a $2.3\times$ speedup and 19.7 FPS on an RTX 5090 GPU.

🧾 Results

🤝 Acknowledgments

We develop our code referring to the following repos:

✏️ Citation

If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work. We are currently organizing the code, and it will be open-sourced upon the paper is accepted.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Light Forcing

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

💡 Why Light Forcing

🧾 Introduction

🧾 Results

🤝 Acknowledgments

✏️ Citation

About

Uh oh!

Releases

Packages

License

chengtao-lv/LightForcing

Folders and files

Latest commit

History

Repository files navigation

Light Forcing

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

💡 Why Light Forcing

🧾 Introduction

🧾 Results

🤝 Acknowledgments

✏️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages