This repository contains reproduction code for paper: "Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking".
This work investigates how relevance signals are distributed across transformer layers in Large Language Models (LLMs) for zero-shot document re-ranking. The main contributions include:
- Layer-wise Analysis: Discovering a universal "Bell-Curve" distribution of relevance signals across transformer layers
- Selective-ICR: A strategy that reduces inference latency by 30%-50% without compromising effectiveness by focusing on high-signal layers
- Unified Comparison: Systematic evaluation of three scoring mechanisms (generation, likelihood, internal attention) across Listwise and Setwise ranking frameworks
- Reasoning-Intensive Tasks: Demonstrating that attention-based re-ranking has high potential on reasoning-intensive tasks
This codebase reproduces two main components:
- Original repo: OSU-NLP-Group/In-Context-Reranking
- Original paper: Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
- Features:
- Supports layer-wise analysis and per-layer evaluation
- Implements Selective-ICR strategy (layer selection and aggregation)
- Evaluates on TREC DL, BEIR, and BRIGHT datasets
- Original repo: ielab/llm-rankers
- Original paper: A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
- Features:
- Supports Listwise and Setwise ranking methods using Llama3.1-8B
- Supports three scoring mechanisms:
generation,likelihood, andattention(using ICR) - Enables unified comparison of different scoring strategies under consistent conditions
- Evaluates on BEIR datasets
- RQ1: How is the relevance signal distributed across transformer layers, and do all layers contribute equally?
- RQ2: How does attention-based scoring compare with generative and likelihood-based methods in Listwise and Setwise frameworks?
- RQ3: Does attention-based ranking remain effective on reasoning-intensive tasks, and is the layer-wise signal distribution universal?
See individual README files for detailed installation and usage:
icr/README.md- Selective-ICR implementation and experimentsllm-rankers/README.md- Ranking methods and scoring comparison
download from google drive: https://drive.google.com/file/d/1Myurx_3DsnBcSq6Em9pb0Y2k_iXX5sV7/view?usp=drive_link
If you use this code, please cite the original papers:
TBD