LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models

We introduce LINA (Learning INterventions Adaptively), a novel framework that enforces physical alignment and out-of-distribution (OOD) instruction following in image and video Diffusion Models (DMs).

Diffusion models have achieved remarkable success but still struggle with physical alignment (e.g., correct reflections, gravity) and OOD generalization. We argue that these issues stem from the models' failure to learn causal directions and to disentangle causal factors. LINA addresses this by learning to predict prompt-specific interventions without altering pre-trained weights.

Our project page is at https://opencausalab.github.io/LINA.

Failures in DMs and LINA's improvement. (a) Baseline models often generate reflections extending beyond surfaces or produce texture errors. (b) Baseline models fail to capture precise spatial prepositions (e.g., "close to" vs "in"). By calibrating the sampling dynamics, LINA successfully aligns the generation with the intended causal graph while preserving original textures.

Key Contributions

Causal Scene Graph (CSG): We introduce a representation that unifies causal dependencies and spatial layouts, providing a basis for diagnostic interventions.
Physical Alignment Probe (PAP): We construct a dataset consisting of structured prompts, SOTA-generated images, and fine-grained masks to quantify DMs' physical and OOD failures.
Diagnostic Analysis: We perform CSG-guided masked inpainting, providing the first quantitative evaluation of DMs' multi-hop reasoning failures through bidirectional probing of edges in the CSG.
LINA Framework: We propose a framework that learns to predict and apply prompt-specific guidance, achieving SOTA alignment on image and video DMs without MLLM inference or retraining.

Architecture

LINA operates in two phases to calibrate the mapping from prompt to image:

Phase 1 (Offline): We train an Adaptive Intervention Module (AIM) using a dataset of "hard cases" where baseline models fail. An MLLM evaluator identifies optimal intervention strengths.
Phase 2 (Online): For new prompts, the pre-trained AIM predicts intervention parameters ($\gamma_1, \gamma_2$). LINA then applies token-level and latent-level interventions during a reallocated computation schedule to enforce causal structure.

Performance

Extensive experiments show that LINA achieves state-of-the-art performance on challenging causal generation tasks. It effectively repairs texture hallucinations and causal failures in both image models (SD-3.5-large, FLUX.1-Krea-dev) and video models (Wan2.2), significantly outperforming existing editing baselines and closed-source solutions.

Installation

1. Environment Setup

We recommend using a fresh Conda environment (Python 3.10) to avoid conflicts.

conda create -n lina python=3.10
conda activate lina

2. Install Dependencies

Install the required packages.

pip install -r requirements.txt

3. Download NLP Model (Critical Step) ⚠️

LINA relies on a lightweight Transformer-based SpaCy pipeline (en_core_web_trf) for robust relation extraction. This model is NOT included in the pip install and must be downloaded manually:

python -m spacy download en_core_web_trf

Note: If you encounter network timeouts, please check your proxy settings or download the wheel file manually from the spacy-models release page.

Paper and Citation

If you find our work useful in your research, please cite:

@article{yu2025lina,
  title={LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models},
  author={Shu Yu and Chaochao Lu},
  year={2025},
  journal={arXiv preprint arXiv:2512.13290},
  url={https://arxiv.org/abs/2512.13290},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LINA		LINA
pap_dataset		pap_dataset
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models

Key Contributions

Architecture

Performance

Installation

1. Environment Setup

2. Install Dependencies

3. Download NLP Model (Critical Step) ⚠️

Paper and Citation

About

Uh oh!

Releases

Packages

Languages

License

OpenCausaLab/LINA

Folders and files

Latest commit

History

Repository files navigation

LINA: Learning INterventions Adaptively for Physical Alignment and Generalization in Diffusion Models

Key Contributions

Architecture

Performance

Installation

1. Environment Setup

2. Install Dependencies

3. Download NLP Model (Critical Step) ⚠️

Paper and Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages