LIONs Alignment

The LION-series are trained using an empirically optimized pipeline that consists of three stages: SFT, DPO, and online preference learning (online DPO). We find simple techniques such as sequence packing, loss masking in SFT, increasing the preference dataset size in DPO, and online DPO training can significantly improve the performance of language models. Our best models (the LION-series) exceed the performance of the official instruct models tuned with closed-source data and algorithms.

🚀 Model Releases

We fine-tuned the base Gemma-2b and LLaMA-3-8b models. We released all our models and intermediate checkpoints for reproducibility.

💎 Gemma Models

Model	Method	Size	Arena-Hard	AlpacaEval-2	MT-Bench	OpenLLM
Gemma-2b	-	2B	-	-	-	46.69
Gemma-2b-it	SFT+RLHF	2B	3.4	5.44	5.63	42.75
Gemma-2b-zephyr	SFT+DPO	2B	0.9	2.65	4.13	46.92
LLaMA-2-7b-chat	SFT	7B	4.6	5.35	6.22	53.16
Vicuna-7b-v1.5	SFT	7B	2.5	7.62	6.57	52.06
LION-Gemma-2b-sft-v1.0 (ours)	SFT	2B	2.4	7.79	6.37	54.78
LION-Gemma-2b-dpo-v1.0 (ours)	SFT+DPO	2B	4.6	8.75	6.58	55.35
LION-Gemma-2b-odpo-v1.0 (ours)	SFT+DPO+ODPO	2B	5.0	9.57	6.75	55.98

🦙 LLaMA Models

Model	Method	Size	Arena-Hard	AlpacaEval-2	MT-Bench	OpenLLM
LLaMA-3-8b	-	8B	-	-	-	63.05
LLaMA-3-8b-it	SFT+RS+DPO+PPO	8B	20.6	22.9	8.00	68.28
LION-LLaMA-3-8b-sft-v1.0 (ours)	SFT	8B	11.3	17.9	7.58	68.71
LION-LLaMA-3-8b-dpo-v1.0 (ours)	SFT+DPO	8B	19.1	21.8	8.12	71.28
LION-LLaMA-3-8b-odpo-v1.0 (ours)	SFT+DPO+ODPO	8B	22.0	26.8	8.19	71.41

📦 Installation

We recommend using Python 3.10 as the environment. To install the required dependencies, run:

pip install -r requirements.txt

You will need to install Flash Attention 2 by running:

python -m pip install flash-attn --no-build-isolation

🏋️ Training

Training requires (at most) 4xA100 80GB GPUs. Please adjust the batch size and gradient accumulation steps if you have a different system.

Stage 1: SFT

For SFT training, refer to the following script for an example of training LION-LLaMA-3-8b-sft-v1.0:

./examples/llama-3/sft.sh

Stage 2: DPO

To speed up DPO training, we 1) pre-compute the reference logits and 2) run DPO trainer without initializing any reference model. We find this to reduce both the training time and memory usage!

As an example of training LION-LLaMA-3-8b-dpo-v1.0, you can use the following scripts:

./examples/llama-3/dpo.sh

Stage 3: Online DPO

Online DPO is split into two parts: 1) generate online preference data and 2) train the model with DPO using the online data. As an example, you can train LION-LLaMA-3-8b-odpo-v1.0 with the following script:

./examples/llama-3/odpo.sh

📋 Evaluation

We provide easy ALL-IN-ONE evaluation pipeline in this directory for popular benchmarks including:

MT-Bench
Alpaca Eval 2.0
Arena Hard Auto
Huggingface OpenLLM 1.0
Huggingface OpenLLM 2.0

For each benchmark, we additionally include the following features:

automatically host your model on an sglang server for faster evaluation (except for OpenLLM evals)
support multiple runs to compute the average performance (--num_runs)
logging performance to wandb (--to_wandb)

For setup and run details, please refer to test.

📚 Citation

If you find our repo useful, please consider cite it as follows:

@misc{yu2024lionsempiricallyoptimizedapproach,
      title={LIONs: An Empirically Optimized Approach to Align Language Models}, 
      author={Xiao Yu and Qingyang Wu and Yu Li and Zhou Yu},
      year={2024},
      eprint={2407.06542},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.06542}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
configs		configs
examples		examples
lionalign		lionalign
scripts		scripts
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LIONs Alignment

🚀 Model Releases

💎 Gemma Models

🦙 LLaMA Models

📦 Installation

🏋️ Training

Stage 1: SFT

Stage 2: DPO

Stage 3: Online DPO

📋 Evaluation

📚 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Columbia-NLP-Lab/LionAlignment

Folders and files

Latest commit

History

Repository files navigation

LIONs Alignment

🚀 Model Releases

💎 Gemma Models

🦙 LLaMA Models

📦 Installation

🏋️ Training

Stage 1: SFT

Stage 2: DPO

Stage 3: Online DPO

📋 Evaluation

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages