ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching

Lihui Gu^1*‡, Jingbin He^2‡, Lianghao Su², Kang He², Wenxiao Wang^1†, Yuliang Liu^2†
(*Work done during an internship at Kling AI Infra, Kuaishou Technology ‡contributed equally to this work †corresponding author)

¹Zhejiang University, ²Kuaishou Technology

🔥 News

2024/09/30 🚀🚀 We release ScalingCach e for Wan2.1, HunyuanVideo and FLUX.
2024/09/20 🤗🤗 We release ScalingCache project page

🚀 Main Performance

Text to Video

Methods	Speedup ↑	PSNR ↑	SSIM ↑	LPIPS ↓	VBench (%) ↑
Wan2.1 1.3B (T = 50)	1×	-	-	-	83.31
+ 40% steps	2.5×	14.50	0.523	0.437	80.30
+ Teacache₀.₀₈	2.0×	22.57	0.806	0.128	81.04
+ Taylorseer	1.9×	13.52	0.510	0.447	81.97
+ EasyCache	2.5×	25.24	0.834	0.095	82.48
+ Ours₁₀ (ours)	2.5×	26.61	0.890	0.071	82.92

Methods	Speedup ↑	PSNR ↑	SSIM ↑	LPIPS ↓	VBench (%) ↑
Wan2.1 14B (T = 50)	1×	-	-	-	84.05
+ 50% steps	2.0×	15.82	0.696	0.336	79.36
+ TeaCache₀.₁₄	1.5×	18.60	0.688	0.244	83.95
+ MixCache	1.8×	23.45	0.814	0.124	83.97
+ Ours₁₀ (ours)	2.5×	25.63	0.861	0.083	83.87

Methods	Speedup ↑	PSNR ↑	SSIM ↑	LPIPS ↓	VBench (%) ↑
HunyuanVideo (T = 50)	1×	-	-	-	81.40
+ 50% steps	2.0×	17.57	0.734	0.247	78.78
+ TeaCache₀.₁	1.5×	23.85	0.819	0.173	80.87
+ MixCache	1.8×	26.86	0.906	0.060	80.98
+ Taylorseer	2.8×	26.57	0.860	0.135	80.74
+ EasyCache	2.2×	29.20	0.904	0.063	80.69
+ Ours₁₂ (ours)	2.2×	30.80	0.930	0.049	81.13

Text to Image

Methods	Speedup ↑	PSNR ↑	SSIM ↑	LPIPS ↓	Clip Score (%) ↑
FLUX 1.dev (T = 50)	1×	-	-	-	80.17
+ 50% steps	2.0×	29.36	0.683	0.318	78.88
+ TeaCache₀.₆	2.0×	28.08	0.400	0.690	81.79
+ Taylorseer₃	2.8×	30.76	0.780	0.230	80.17
+ Ours₁₀ (ours)	3.0×	32.28	0.819	0.131	80.25

🛠️ Usage

ScalingCache operates in two main stages:

offline computation for scaling coefficients
online inference

We provide precomputed coefficient dictionaries under assets/alpha_dict/ for models including Wan2.1, HunyuanVideo, and Flux. You can skip step 1 and proceed directly with model inference.

Detailed instructions for each supported model are provided in their respective directories.

👍 Acknowledgements

Thanks to Taylorseer for proposing the use of Taylor expansion for feature prediction and caching.
Thanks to EasyCache for the inspiration it provided for our dynamic caching strategy.
Thanks to DiT for their great work and codebase upon which we build TaylorSeer-DiT.
Thanks to FLUX for their great work and codebase upon which we build TaylorSeer-FLUX.
Thanks to HunyuanVideo for their great work and codebase upon which we build TaylorSeer-HunyuanVideo.
Thanks to Wan2.1 for their great work and codebase upon which we build TaylorSeer-Wan2.1.
Thanks to VBench for Text-to-Video quality evaluation.
Thanks to Drawbench for providing the text-to-image dataset.

📌 Citation

📧 Contact

If you have any questions, please email glh9803@outlook.com.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Flux		Flux
HunyuanVideo		HunyuanVideo
Wan2.1		Wan2.1
Wan2.2		Wan2.2
assets		assets
scaling_cache		scaling_cache
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching

🔥 News

🚀 Main Performance

Text to Video

Text to Image

🛠️ Usage

👍 Acknowledgements

📌 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Languages

KlingAIResearch/ScalingCache

Folders and files

Latest commit

History

Repository files navigation

ScalingCache: Extreme Acceleration of DiTs through Difference Scaling and Dynamic Interval Caching

🔥 News

🚀 Main Performance

Text to Video

Text to Image

🛠️ Usage

👍 Acknowledgements

📌 Citation

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages