QA Retrieval

This project explores and benchmarks multiple retrieval techniques on the HotpotQA corpus. The goal was to optimize retrieval quality as measured by mean nDCG@10, focusing on improving beyond standard dense retrievers and cross-encoder rerankers. After iterative experimentation with dense retrieval, BM25, and reranking models, the best performance was achieved through a LambdaRank-based ranker trained on a custom feature set including token counts and scores from other reranking models.

Methodology

1. Baselines

Dense Retriever: Vector-based retrieval using pretrained (bge-large-en-v1.5) embeddings .
Dense Retriever + Cross Encoder: Dense top-50 retrieval followed by cross-ecnoder (bge-reranker-large) rereanking.

2. Proposed System

The final system achieved superior results through a two-staged pipeline:

Stage 1: Dense Retrieval

Retrieve top 50 candidate documents using a dense retriever.

Stage 2: LambdaRank Reranking

Use a LambdaRank model to optimize ranking based on multiple informative features:
- BM25 Score: lexical similarity between query and document.
- Cross Encoder Score: semantic similarity between query and document.
- LLM Score: contextual relevance estimated by a large language model (Mistral-7B-Instruct-v0.3.Q8_0).
- Document Length: number of tokens in the document.
- Query Length: number of tokens in the query.
The model is trained on a total of 200000 query-document pairs - 10000 queries x 20 documents retrieved per query by dense retriever.
The model reranks the top 50 candidates to produce the final top 10 ranking.

Results

The model was evaluated using the first 4000 queries from the HotpotQA validation set.

Model	Mean nDCG@10
Dense Retriever	0.86235
Dense Retriever + Cross Encoder	0.93665
Dense Retriever + LambdaRank Reranker	0.94159

Key Insights

Combining semantic, lexical, and structural features significantly improves retrieval quality.
LambdaRank provides a flexible framework for leveraging diverse signals without retraining large encoder models.

Future Work

Perform an ablation study to investigate which features are truly signficant.
Incorporate multi-hop retrieval signals to better handle reasoning chains.
Experiment with pairwise LLM preference data for improved LambdaRank supervision.
Extend the pipeline to end-to-end QA generation using retrieved contexts.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
outputs		outputs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QA Retrieval

Methodology

1. Baselines

2. Proposed System

Stage 1: Dense Retrieval

Stage 2: LambdaRank Reranking

Results

Key Insights

Future Work

About

Uh oh!

Releases

Packages

Languages

ycz425/qa_retrieval

Folders and files

Latest commit

History

Repository files navigation

QA Retrieval

Methodology

1. Baselines

2. Proposed System

Stage 1: Dense Retrieval

Stage 2: LambdaRank Reranking

Results

Key Insights

Future Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages