Skip to content

yining610/ctwa

Repository files navigation

Uncovering Cross-Objective Interference in Multi-Objective Alignment

TL;DR: We analyze cross-objective interference in multi-objective alignment of LLMs and introduce an approach to mitigate it.
Paper on arXiv

Folder Structure

scripts/     // callable scripts for data preprocessing
verl/        // source code of models, algorithms, data structures, metrics, etc. 
examples/    // bash scripts to run jobs
data/        // pre-processed data used in experiments

Environment

We use the Dockerfile to build the environment. For more setup instructions, please refer to the verl environment setup guide

We use Wandb to log experiments, so please log in before running them.

Experiment

We implemented a total of 8 scalarization algorithms from MOO and MTL, including our CTWA. They are:

  1. CTWA (ours): verl/trainer/ppo/ray_trainer_covariance.py
  2. Dynamic Reweighting: verl/trainer/ppo/ray_trainer_dynamic.py
  3. GradNorm: verl/trainer/ppo/ray_trainer_gradnorm.py
  4. Lagrangian Primal-Dual Method: verl/trainer/ppo/ray_trainer_lagrangian.py
  5. MGDA: verl/trainer/ppo/ray_trainer_mgda.py
  6. Linear: verl/trainer/ppo/ray_trainer_multiobjective.py
  7. PAMA: verl/trainer/ppo/ray_trainer_pama.py
  8. Tchebycheff Scalarization: verl/trainer/ppo/ray_trainer_tchebycheff.py

We provide the bash script of each algorithm in the examples/ directory. Taking an example of training Qwen2.5-1.5B-Base using CTWA:

bash examples/ctwa_trainer/run_covariance_math.sh

Citation

If you use our code, please cite the following paper:

@misc{lu2026uncoveringcrossobjectiveinterferencemultiobjective,
      title={Uncovering Cross-Objective Interference in Multi-Objective Alignment}, 
      author={Yining Lu and Meng Jiang},
      year={2026},
      eprint={2602.06869},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.06869}, 
}

About

Official implementation of paper "Uncovering Cross-Objective Interference in Multi-Objective Alignment"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages