YODA is a novel neural video codec designed to achieve extreme perceptual quality with efficient inference speed.
While one-step diffusion models have excelled in image compression, applying them to video remains a challenge. YODA overcomes the limitations of traditional methods and existing deep learning baselines by introducing a One-step Diffusion Transformer and a Temporal-Awareness mechanism.
Core Highlights:
- Perceptual Quality: YODA consistently outperforms H.266/VVC and SOTA neural codecs (such as DCVC-RT and PLVC) on perceptual metrics including LPIPS, DISTS, FID, and KID.
- One-Step Denoising: Utilizing a lightweight linear DiT model, YODA performs denoising in a single step, significantly reducing the inference latency associated with diffusion models.
- Temporal-Aware Design: Unlike prior efforts that rely on frozen 2D autoencoders, YODA employs a trainable Temporal-Aware AutoEncoder (TA-AE) to fully exploit inter-frame correlations.
YODA proposes an end-to-end unified design consisting of three key components:
- Temporal-Aware AutoEncoder (TA-AE): Extracts multiscale features from reference frames to generate a compact latent representation.
- Conditional Latent Coder (CLC): Implicitly models motion within the feature space to perform efficient entropy coding.
- Linear DiT Model: Adopts a linear DiT for efficient one-step denoising.
YODA demonstrates superior performance across multiple datasets (UVG, HEVC-B, MCL-JCV), surpassing both traditional standards (VTM) and recent neural video codecs (DCVC-RT, DiffVC, PLVC)
Figure: Perceptual quality performance comparisons on UVG, HEVC-B, and MCL-JCV datasets. Lower is better.
We provide an interactive video comparison (with sliding view) on our project page to demonstrate the visual reconstruction quality of YODA against the Ground Truth.
We utilized the Vimeo-90K dataset for training and evaluated our model on the UVG, MCL-JCV, and HEVC Class B datasets.
We thank the authors of the following projects for their pioneering contributions and open-source efforts:
