Planted in Pretraining, Swayed by Finetuning

Disentangling the Origins of Cognitive Biases in Language Models.

📘 Introduction

This repository contains the code for our paper:

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

We investigate the origin of cognitive biases in large language models (LLMs). While prior work showed these biases emerge and even intensify after instruction tuning, it's unclear whether they are caused by pretraining, finetuning data, or training randomness.

We propose a two-step causal analysis framework:

First, we assess how much random seed fluctuations affect bias scores.
Second, we perform cross-tuning: swapping instruction datasets between pretrained models to identify if biases are driven by the pretraining backbone or the finetuning data.

Our results show that:

Training randomness introduces some noise in bias scores.
However, pretraining consistently dominates as the primary source of biases, with instruction tuning playing a secondary role.

🧭 Repository Structure

This repository integrates and builds on three main sub-repositories:

📦 open-instruct: Parameter-efficient LoRA finetuning framework.
📊 instructed-to-bias: Evaluation for belief and certainty biases.
🧠 cognitive-biases-in-llms: Benchmark suite for 30 cognitive biases.

Refer to those repositories for dataset structures, implementation details, and original evaluation scripts.

🔗 Model & Dataset Access

All trained models across seeds and the subsampled Flan instruction dataset are hosted on Hugging Face:

🤗 Hugging Face Collection: planted_in_pretraining

⚙️ Environment Setup

We recommend setting up with conda:

conda create -n bias_origin python=3.10 -y
conda activate bias_origin
pip install -r requirements.txt

# Optional: install submodules in editable mode
git clone https://github.com/allenai/open-instruct.git
pip install -e open-instruct/

git clone https://github.com/itay1itzhak/InstructedToBias.git
pip install -e instructed-to-bias/

git clone https://github.com/simonmalberg/cognitive-biases-in-llms.git
pip install -e cognitive-biases-in-llms/

🚀 Key Analyses

🎲 Step 1: Training Randomness Analysis

What it checks:
This experiment finetunes the same model and dataset with different seeds to test how much randomness affects bias scores.

python run_randomness_analysis.py --granularity-levels model_bias

What we found:
Randomness introduces minor fluctuations in individual bias scores, but averaging across seeds recovers stable patterns. This suggests randomness alone is not a primary driver of cognitive bias.

🔁 Step 2: Cross-Tuning Clustering Analysis

What it checks:
This analysis swaps instruction datasets between two pretrained models (e.g., Flan vs Tulu) and compares their bias vectors. We cluster models either by pretraining backbone or instruction data.

python run_similarity_analysis.py \
    --granularity-levels model_bias_scenario \
    --models-to-include T5,OLMo

What we found:
Models cluster strongly by pretraining identity. Even after swapping instruction data, bias patterns remain closer to the original backbone than to the new data. This supports our main claim: biases are planted during pretraining.

📊 Visual Outputs

Example outputs (PDFs saved to plots/):

📚 Citation

To cite our work, use the CoLM BibTeX entry from Google Scholar or:

@misc{itzhak2025plantedpretrainingswayedfinetuning,
      title={Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs}, 
      author={Itay Itzhak and Yonatan Belinkov and Gabriel Stanovsky},
      year={2025},
      eprint={2507.07186},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.07186}, 
}

📜 License

📬 Contact

For questions or collaborations, please reach out via GitHub Issues or email:
📧 [itay1itzhak@gmaildotcom]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
InstructedToBias		InstructedToBias
cognitive-biases-in-llms		cognitive-biases-in-llms
data		data
finetuning/open-instruct		finetuning/open-instruct
plots		plots
static		static
.gitignore		.gitignore
.nojekyll		.nojekyll
README.md		README.md
environment.yml		environment.yml
index.html		index.html
requirements.txt		requirements.txt
results_plots.ipynb		results_plots.ipynb
run_randomness_analysis.py		run_randomness_analysis.py
run_similarity_analysis.py		run_similarity_analysis.py
similarity_analysis.py		similarity_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Planted in Pretraining, Swayed by Finetuning

📘 Introduction

🧭 Repository Structure

🔗 Model & Dataset Access

⚙️ Environment Setup

🚀 Key Analyses

🎲 Step 1: Training Randomness Analysis

🔁 Step 2: Cross-Tuning Clustering Analysis

📊 Visual Outputs

📚 Citation

📜 License

📬 Contact

About

Uh oh!

Releases

Packages

Languages

itay1itzhak/planted-in-pretraining

Folders and files

Latest commit

History

Repository files navigation

Planted in Pretraining, Swayed by Finetuning

📘 Introduction

🧭 Repository Structure

🔗 Model & Dataset Access

⚙️ Environment Setup

🚀 Key Analyses

🎲 Step 1: Training Randomness Analysis

🔁 Step 2: Cross-Tuning Clustering Analysis

📊 Visual Outputs

📚 Citation

📜 License

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages