YODA: Yet Another One-step Diffusion-based Video Compressor

Xingchen Li, Junzhe Zhang, Junqi Shi, Ming Lu, Zhan Ma

Nanjing University

📢 Introduction

YODA is a novel neural video codec designed to achieve extreme perceptual quality with efficient inference speed.

While one-step diffusion models have excelled in image compression, applying them to video remains a challenge. YODA overcomes the limitations of traditional methods and existing deep learning baselines by introducing a One-step Diffusion Transformer and a Temporal-Awareness mechanism.

Core Highlights:

Perceptual Quality: YODA consistently outperforms H.266/VVC and SOTA neural codecs (such as DCVC-RT and PLVC) on perceptual metrics including LPIPS, DISTS, FID, and KID.
One-Step Denoising: Utilizing a lightweight linear DiT model, YODA performs denoising in a single step, significantly reducing the inference latency associated with diffusion models.
Temporal-Aware Design: Unlike prior efforts that rely on frozen 2D autoencoders, YODA employs a trainable Temporal-Aware AutoEncoder (TA-AE) to fully exploit inter-frame correlations.

🚀 Framework

YODA proposes an end-to-end unified design consisting of three key components:

Temporal-Aware AutoEncoder (TA-AE): Extracts multiscale features from reference frames to generate a compact latent representation.
Conditional Latent Coder (CLC): Implicitly models motion within the feature space to perform efficient entropy coding.
Linear DiT Model: Adopts a linear DiT for efficient one-step denoising.

🏆 Performance

YODA demonstrates superior performance across multiple datasets (UVG, HEVC-B, MCL-JCV), surpassing both traditional standards (VTM) and recent neural video codecs (DCVC-RT, DiffVC, PLVC)

Perceptual Quality RD Curves (LPIPS, DISTS, FID, KID)

Figure: Perceptual quality performance comparisons on UVG, HEVC-B, and MCL-JCV datasets. Lower is better.

👁️ Visual Comparison

We provide an interactive video comparison (with sliding view) on our project page to demonstrate the visual reconstruction quality of YODA against the Ground Truth.

📂 Data Preparation

We utilized the Vimeo-90K dataset for training and evaluated our model on the UVG, MCL-JCV, and HEVC Class B datasets.

🤝 Acknowledgment

We thank the authors of the following projects for their pioneering contributions and open-source efforts:

DCVC-RT: Towards Practical Real-time Neural Video Compression.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers.
DC-AE: Deep Compression Autoencoder.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YODA: Yet Another One-step Diffusion-based Video Compressor

📢 Introduction

🚀 Framework

🏆 Performance

👁️ Visual Comparison

📂 Data Preparation

🤝 Acknowledgment

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

NJUVISION/YODA

Folders and files

Latest commit

History

Repository files navigation

YODA: Yet Another One-step Diffusion-based Video Compressor

📢 Introduction

🚀 Framework

🏆 Performance

👁️ Visual Comparison

📂 Data Preparation

🤝 Acknowledgment

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages