Lihui Gu1*‡, Jingbin He2‡, Lianghao Su2, Kang He2, Wenxiao Wang1†, Yuliang Liu2†
(*Work done during an internship at Kling AI Infra, Kuaishou Technology ‡contributed equally to this work †corresponding author)
1Zhejiang University, 2Kuaishou Technology
-
2024/09/30🚀🚀 We release ScalingCach e for Wan2.1, HunyuanVideo and FLUX. -
2024/09/20🤗🤗 We release ScalingCache project page
| Methods | Speedup ↑ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | VBench (%) ↑ |
|---|---|---|---|---|---|
| Wan2.1 1.3B (T = 50) | 1× | - | - | - | 83.31 |
| + 40% steps | 2.5× | 14.50 | 0.523 | 0.437 | 80.30 |
| + Teacache₀.₀₈ | 2.0× | 22.57 | 0.806 | 0.128 | 81.04 |
| + Taylorseer | 1.9× | 13.52 | 0.510 | 0.447 | 81.97 |
| + EasyCache | 2.5× | 25.24 | 0.834 | 0.095 | 82.48 |
| + Ours₁₀ (ours) | 2.5× | 26.61 | 0.890 | 0.071 | 82.92 |
| Methods | Speedup ↑ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | VBench (%) ↑ |
|---|---|---|---|---|---|
| Wan2.1 14B (T = 50) | 1× | - | - | - | 84.05 |
| + 50% steps | 2.0× | 15.82 | 0.696 | 0.336 | 79.36 |
| + TeaCache₀.₁₄ | 1.5× | 18.60 | 0.688 | 0.244 | 83.95 |
| + MixCache | 1.8× | 23.45 | 0.814 | 0.124 | 83.97 |
| + Ours₁₀ (ours) | 2.5× | 25.63 | 0.861 | 0.083 | 83.87 |
| Methods | Speedup ↑ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | VBench (%) ↑ |
|---|---|---|---|---|---|
| HunyuanVideo (T = 50) | 1× | - | - | - | 81.40 |
| + 50% steps | 2.0× | 17.57 | 0.734 | 0.247 | 78.78 |
| + TeaCache₀.₁ | 1.5× | 23.85 | 0.819 | 0.173 | 80.87 |
| + MixCache | 1.8× | 26.86 | 0.906 | 0.060 | 80.98 |
| + Taylorseer | 2.8× | 26.57 | 0.860 | 0.135 | 80.74 |
| + EasyCache | 2.2× | 29.20 | 0.904 | 0.063 | 80.69 |
| + Ours₁₂ (ours) | 2.2× | 30.80 | 0.930 | 0.049 | 81.13 |
| Methods | Speedup ↑ | PSNR ↑ | SSIM ↑ | LPIPS ↓ | Clip Score (%) ↑ |
|---|---|---|---|---|---|
| FLUX 1.dev (T = 50) | 1× | - | - | - | 80.17 |
| + 50% steps | 2.0× | 29.36 | 0.683 | 0.318 | 78.88 |
| + TeaCache₀.₆ | 2.0× | 28.08 | 0.400 | 0.690 | 81.79 |
| + Taylorseer₃ | 2.8× | 30.76 | 0.780 | 0.230 | 80.17 |
| + Ours₁₀ (ours) | 3.0× | 32.28 | 0.819 | 0.131 | 80.25 |
ScalingCache operates in two main stages:
- offline computation for scaling coefficients
- online inference
We provide precomputed coefficient dictionaries under assets/alpha_dict/ for models including Wan2.1, HunyuanVideo, and Flux. You can skip step 1 and proceed directly with model inference.
Detailed instructions for each supported model are provided in their respective directories.
- Thanks to Taylorseer for proposing the use of Taylor expansion for feature prediction and caching.
- Thanks to EasyCache for the inspiration it provided for our dynamic caching strategy.
- Thanks to DiT for their great work and codebase upon which we build TaylorSeer-DiT.
- Thanks to FLUX for their great work and codebase upon which we build TaylorSeer-FLUX.
- Thanks to HunyuanVideo for their great work and codebase upon which we build TaylorSeer-HunyuanVideo.
- Thanks to Wan2.1 for their great work and codebase upon which we build TaylorSeer-Wan2.1.
- Thanks to VBench for Text-to-Video quality evaluation.
- Thanks to Drawbench for providing the text-to-image dataset.
If you have any questions, please email glh9803@outlook.com.