Skip to content

A systems-level analysis of static RAG pipelines, isolating ingestion, retrieval, and ranking boundaries to expose structural failure modes before generation.

Notifications You must be signed in to change notification settings

Arnav-Ajay/rag-systems-foundations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

rag-systems-foundations (System Boundaries and Failure Modes)

TL;DR

This repository documents the limits of static Retrieval-Augmented Generation (RAG) systems by isolating and evaluating each boundary in the pipeline: ingestion, retrieval, and ranking.

Across multiple controlled experiments, we see that many RAG failures originate before generation — and often before retrieval even runs.


What This Repository Is

This repository is a systems-level synthesis of several focused experiments that analyze static RAG pipelines, defined as systems where:

  • documents are chunked once
  • embeddings are fixed
  • retrieval behavior is non-adaptive
  • ranking is applied once per query
  • generation consumes a fixed context window

No agents. No memory. No adaptive control flow.

Only frozen pipelines under controlled variation.


Why Static RAG Fails (High-Level)

Static RAG systems fail for reasons that are often misattributed:

  • ❌ “The model hallucinated”
  • ❌ “The retriever is weak”
  • ❌ “The embeddings aren’t good enough”

This body of work shows that failures more often arise from:

  • evidence never being representable as a single unit
  • relevant chunks existing but appearing too deep to retrieve
  • ranking saturating due to poorly formed chunks

These are structural failures, not model failures.


System Boundaries Analyzed

This repository organizes results by system boundary, not chronology.

Each boundary is evaluated in isolation, with all others frozen.


1️⃣ Minimal RAG Control System

Repo: rag-minimal-control

Boundary isolated: End-to-end RAG correctness under minimal assumptions.

What it establishes: A smallest-possible RAG system that can still fail in meaningful ways.

Key insight: A system can be architecturally correct yet fail most questions due to evidence starvation — without hallucination.


2️⃣ Retrieval as a Measurable Boundary

Repo: rag-retrieval-eval

Boundary isolated: Evidence surfacing, independent of generation.

What it establishes: Whether relevant evidence exists and whether it appears within Top-K can be measured explicitly.

Key insight: Many failures are retrieval-depth failures, not absence of evidence.


3️⃣ Retrieval Regime: Dense, Sparse, Hybrid

Repo: rag-hybrid-retrieval

Boundary isolated: Recall behavior under different retrieval signals.

What it establishes: Lexical and semantic retrievers surface complementary evidence.

Key insight: Hybrid retrieval improves surfacing but does not eliminate ranking depth or representability issues.


4️⃣ Ranking as a Saturating Layer

Repo: rag-reranking-playground

Boundary isolated: Evidence prioritization after retrieval.

What it establishes: Reranking improves ordering but cannot recover missing or fragmented evidence.

Key insight: Ranking saturates when chunking is misaligned with question structure.


5️⃣ Chunking as a Representational Boundary

Repo: rag-chunking-strategies

Boundary isolated: Document ingestion and representability.

What it establishes: Chunking determines which questions are even answerable as single-context queries.

Key insight: Changing only the chunking strategy causes questions to transition between:

  • Atomic
  • Structural
  • Compositional
  • Unanswerable

Retrieval quality becomes irrelevant when no coherent chunk exists.


What Static RAG Can and Cannot Do

Static RAG can:

  • retrieve localized facts
  • surface section-level explanations
  • benefit from hybrid retrieval and reranking
  • fail deterministically and diagnosably

Static RAG cannot:

  • adapt retrieval strategy per question
  • reason across disjoint chunks
  • recover broken causal chains
  • respond when representability fails

These are hard system limits, not tuning problems.


Core Findings (Bounded Claims)

This repository establishes that:

  • Retrieval failure is often a ranking problem
  • Ranking failure is often an ingestion problem
  • Chunking is not preprocessing — it is representation modeling
  • Some questions are unanswerable under static constraints, regardless of retriever strength

No claims are made about:

  • answer quality
  • faithfulness
  • optimal chunking strategies
  • production best practices

Why This Boundary Matters

Once ingestion, retrieval, and ranking are frozen:

  • remaining failures are not stochastic
  • further tuning yields diminishing returns
  • system behavior becomes predictable

At this point, improving performance requires changing the computational regime, not the parameters.


What Comes After Static RAG

When answers are:

  • compositional by nature
  • distributed across chunks
  • absent as single coherent units

…the system must decide what to do next.

That transition marks the boundary between:

static pipelines and adaptive, agentic systems

This repository closes the static regime.

The next regime — where systems must decide whether to retrieve, how to act, and what state to carry forward — is established in:

agent-systems-core


Core Claim

Before adding intelligence, measure and understand the constraints of the system you already have.

This repository does exactly that.


About

A systems-level analysis of static RAG pipelines, isolating ingestion, retrieval, and ranking boundaries to expose structural failure modes before generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published