Where Relevance Emerges - A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

This repository contains reproduction code for paper: "Where Relevance Emerges: A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking".

Overview

This work investigates how relevance signals are distributed across transformer layers in Large Language Models (LLMs) for zero-shot document re-ranking. The main contributions include:

Layer-wise Analysis: Discovering a universal "Bell-Curve" distribution of relevance signals across transformer layers
Selective-ICR: A strategy that reduces inference latency by 30%-50% without compromising effectiveness by focusing on high-signal layers
Unified Comparison: Systematic evaluation of three scoring mechanisms (generation, likelihood, internal attention) across Listwise and Setwise ranking frameworks
Reasoning-Intensive Tasks: Demonstrating that attention-based re-ranking has high potential on reasoning-intensive tasks

Repository Structure

This codebase reproduces two main components:

1. `icr/` - In-Context Reranking (ICR)

Original repo: OSU-NLP-Group/In-Context-Reranking
Original paper: Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
Features:
- Supports layer-wise analysis and per-layer evaluation
- Implements Selective-ICR strategy (layer selection and aggregation)
- Evaluates on TREC DL, BEIR, and BRIGHT datasets

2. `llm-rankers/` - LLM-based Ranking Methods

Original repo: ielab/llm-rankers
Original paper: A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models
Features:
- Supports Listwise and Setwise ranking methods using Llama3.1-8B
- Supports three scoring mechanisms: generation, likelihood, and attention (using ICR)
- Enables unified comparison of different scoring strategies under consistent conditions
- Evaluates on BEIR datasets

Key Research Questions

RQ1: How is the relevance signal distributed across transformer layers, and do all layers contribute equally?
RQ2: How does attention-based scoring compare with generative and likelihood-based methods in Listwise and Setwise frameworks?
RQ3: Does attention-based ranking remain effective on reasoning-intensive tasks, and is the layer-wise signal distribution universal?

Quick Start

See individual README files for detailed installation and usage:

icr/README.md - Selective-ICR implementation and experiments
llm-rankers/README.md - Ranking methods and scoring comparison

Experiment Results:

download from google drive: https://drive.google.com/file/d/1Myurx_3DsnBcSq6Em9pb0Y2k_iXX5sV7/view?usp=drive_link

Citation

If you use this code, please cite the original papers:

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
icr		icr
llm-rankers		llm-rankers
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Where Relevance Emerges - A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

Overview

Repository Structure

1. `icr/` - In-Context Reranking (ICR)

2. `llm-rankers/` - LLM-based Ranking Methods

Key Research Questions

Quick Start

Experiment Results:

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ielab/Selective-ICR

Folders and files

Latest commit

History

Repository files navigation

Where Relevance Emerges - A Layer-Wise Study of Internal Attention for Zero-Shot Re-Ranking

Overview

Repository Structure

1. icr/ - In-Context Reranking (ICR)

2. llm-rankers/ - LLM-based Ranking Methods

Key Research Questions

Quick Start

Experiment Results:

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

1. `icr/` - In-Context Reranking (ICR)

2. `llm-rankers/` - LLM-based Ranking Methods

Packages