Skip to content

LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models

License

Notifications You must be signed in to change notification settings

OpenCausaLab/LINA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models

We introduce LINA (Learning INterventions Adaptively), a novel framework that enforces physical alignment and out-of-distribution (OOD) instruction following in image and video Diffusion Models (DMs).

Diffusion models have achieved remarkable success but still struggle with physical alignment (e.g., correct reflections, gravity) and OOD generalization. We argue that these issues stem from the models' failure to learn causal directions and to disentangle causal factors. LINA addresses this by learning to predict prompt-specific interventions without altering pre-trained weights.

Our project page is at https://opencausalab.github.io/LINA.

Overall Demo

Failures in DMs and LINA's improvement. (a) Baseline models often generate reflections extending beyond surfaces or produce texture errors. (b) Baseline models fail to capture precise spatial prepositions (e.g., "close to" vs "in"). By calibrating the sampling dynamics, LINA successfully aligns the generation with the intended causal graph while preserving original textures.

Key Contributions

  1. Causal Scene Graph (CSG): We introduce a representation that unifies causal dependencies and spatial layouts, providing a basis for diagnostic interventions.
  2. Physical Alignment Probe (PAP): We construct a dataset consisting of structured prompts, SOTA-generated images, and fine-grained masks to quantify DMs' physical and OOD failures.
  3. Diagnostic Analysis: We perform CSG-guided masked inpainting, providing the first quantitative evaluation of DMs' multi-hop reasoning failures through bidirectional probing of edges in the CSG.
  4. LINA Framework: We propose a framework that learns to predict and apply prompt-specific guidance, achieving SOTA alignment on image and video DMs without MLLM inference or retraining.

Architecture

LINA operates in two phases to calibrate the mapping from prompt to image:

  • Phase 1 (Offline): We train an Adaptive Intervention Module (AIM) using a dataset of "hard cases" where baseline models fail. An MLLM evaluator identifies optimal intervention strengths.
  • Phase 2 (Online): For new prompts, the pre-trained AIM predicts intervention parameters ($\gamma_1, \gamma_2$). LINA then applies token-level and latent-level interventions during a reallocated computation schedule to enforce causal structure.
LINA Architecture

Performance

Extensive experiments show that LINA achieves state-of-the-art performance on challenging causal generation tasks. It effectively repairs texture hallucinations and causal failures in both image models (SD-3.5-large, FLUX.1-Krea-dev) and video models (Wan2.2), significantly outperforming existing editing baselines and closed-source solutions.

Installation

1. Environment Setup

We recommend using a fresh Conda environment (Python 3.10) to avoid conflicts.

conda create -n lina python=3.10
conda activate lina

2. Install Dependencies

Install the required packages.

pip install -r requirements.txt

3. Download NLP Model (Critical Step) ⚠️

LINA relies on a lightweight Transformer-based SpaCy pipeline (en_core_web_trf) for robust relation extraction. This model is NOT included in the pip install and must be downloaded manually:

python -m spacy download en_core_web_trf

Note: If you encounter network timeouts, please check your proxy settings or download the wheel file manually from the spacy-models release page.

Paper and Citation

If you find our work useful in your research, please cite:

@article{yu2025lina,
  title={LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models},
  author={Shu Yu and Chaochao Lu},
  year={2025},
  journal={arXiv preprint arXiv:2512.13290},
  url={https://arxiv.org/abs/2512.13290},
}

About

LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published