This repository documents the limits of static Retrieval-Augmented Generation (RAG) systems by isolating and evaluating each boundary in the pipeline: ingestion, retrieval, and ranking.
Across multiple controlled experiments, we see that many RAG failures originate before generation — and often before retrieval even runs.
This repository is a systems-level synthesis of several focused experiments that analyze static RAG pipelines, defined as systems where:
- documents are chunked once
- embeddings are fixed
- retrieval behavior is non-adaptive
- ranking is applied once per query
- generation consumes a fixed context window
No agents. No memory. No adaptive control flow.
Only frozen pipelines under controlled variation.
Static RAG systems fail for reasons that are often misattributed:
- ❌ “The model hallucinated”
- ❌ “The retriever is weak”
- ❌ “The embeddings aren’t good enough”
This body of work shows that failures more often arise from:
- evidence never being representable as a single unit
- relevant chunks existing but appearing too deep to retrieve
- ranking saturating due to poorly formed chunks
These are structural failures, not model failures.
This repository organizes results by system boundary, not chronology.
Each boundary is evaluated in isolation, with all others frozen.
Repo:
rag-minimal-control
Boundary isolated: End-to-end RAG correctness under minimal assumptions.
What it establishes: A smallest-possible RAG system that can still fail in meaningful ways.
Key insight: A system can be architecturally correct yet fail most questions due to evidence starvation — without hallucination.
Repo:
rag-retrieval-eval
Boundary isolated: Evidence surfacing, independent of generation.
What it establishes: Whether relevant evidence exists and whether it appears within Top-K can be measured explicitly.
Key insight: Many failures are retrieval-depth failures, not absence of evidence.
Repo:
rag-hybrid-retrieval
Boundary isolated: Recall behavior under different retrieval signals.
What it establishes: Lexical and semantic retrievers surface complementary evidence.
Key insight: Hybrid retrieval improves surfacing but does not eliminate ranking depth or representability issues.
Repo:
rag-reranking-playground
Boundary isolated: Evidence prioritization after retrieval.
What it establishes: Reranking improves ordering but cannot recover missing or fragmented evidence.
Key insight: Ranking saturates when chunking is misaligned with question structure.
Repo:
rag-chunking-strategies
Boundary isolated: Document ingestion and representability.
What it establishes: Chunking determines which questions are even answerable as single-context queries.
Key insight: Changing only the chunking strategy causes questions to transition between:
- Atomic
- Structural
- Compositional
- Unanswerable
Retrieval quality becomes irrelevant when no coherent chunk exists.
- retrieve localized facts
- surface section-level explanations
- benefit from hybrid retrieval and reranking
- fail deterministically and diagnosably
- adapt retrieval strategy per question
- reason across disjoint chunks
- recover broken causal chains
- respond when representability fails
These are hard system limits, not tuning problems.
This repository establishes that:
- Retrieval failure is often a ranking problem
- Ranking failure is often an ingestion problem
- Chunking is not preprocessing — it is representation modeling
- Some questions are unanswerable under static constraints, regardless of retriever strength
No claims are made about:
- answer quality
- faithfulness
- optimal chunking strategies
- production best practices
Once ingestion, retrieval, and ranking are frozen:
- remaining failures are not stochastic
- further tuning yields diminishing returns
- system behavior becomes predictable
At this point, improving performance requires changing the computational regime, not the parameters.
When answers are:
- compositional by nature
- distributed across chunks
- absent as single coherent units
…the system must decide what to do next.
That transition marks the boundary between:
static pipelines and adaptive, agentic systems
This repository closes the static regime.
The next regime — where systems must decide whether to retrieve, how to act, and what state to carry forward — is established in:
Before adding intelligence, measure and understand the constraints of the system you already have.
This repository does exactly that.