Chengtao Lv, Yumeng Shi, Yushi Huang, Ruihao Gong, Shen Ren, Wenya Wang
NTU, HKUST, Sensetime (LightX2V Group)
intro.mp4
- 🥇 Pioneer work: The first to explore sparse attention acceleration for autoregressive video generation.
- 🏆 Superior performance: Achieves a VBench total score of 84.5, delivering high-quality results with strong overall performance.
- 🚀 Significant acceleration: Provides over 3× Attention speedup and 1.2–1.3× end-to-end speedup. up to 2.3× end-to-end acceleration when combined with FP8 and LightVAE (19.7 FPS on a single RTX 5090 GPU).
Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose Light Forcing, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g.,
We develop our code referring to the following repos:
If you find our toolkit or research paper useful or relevant to your research, please kindly cite our work. We are currently organizing the code, and it will be open-sourced upon the paper is accepted.

