Modular retrieval engine

This framework allows to easily ingest a different kind of data and experiment with different retrieval system, evaluate each of them and gives accuracy measurements to say which one is the best.

The retrieval system takes a plain text user query and search for relevant data with a combination of strict filtering and similarity search via a vector database filled with embeddings of the data.

For more in-depth explanation, see Modular Retrieval System

Setup

uv venv
source .venv/bin/activate
uv sync

docker compose up -d

Vectors are stored in Qdrant vector database.

Processed transformations and documents transformation state are stored in an elasticsearch database.

Engine description

The engine is responsible for:

Indexing source documents
Retrieving documents similar to a query
Evaluating its search accuracy

To create an engine, it is needed to describe how to fetch data from source, how to transform documents and how to search for them.

An engine config contains:

List of embedding models to use
Chunking implementations
Derivation implementations
Transformation combinaisons (chunking + derivation + embedding model)
Search strategies (transformations/parrallel/sequential/reranking)
Source repository implementation
Ingester configuration
Evaluation dataset

Pre-made engines

There are different pre-made engines for specific data types.

markdown: Indexes all markdown files in a directory

Usage

Engine selection

Default engine is markdown. Change DEFAULT_ENGINE in ./src/engines/engines.py

Commands

Prepare storage

make prepare

Ingest data (all)

make ingest

Evaluate

make evaluate

Migrate vector DB

make migrate-vector

Useful when new vectors are added to the vector DB

MCP OpenAPI server

make mcp

This will start a MCP server at http://localhost:8000

Tools

Streamlit UI

WebUI for interacting with data

make streamlit

http://localhost:8501

Qdrant

Access studio at http://127.0.0.1:6334

Example use

Download a markdown documentation from a git repository

./src/engines/markdown/download_git_folder.sh https://github.com/wevm/wagmi site/react ./data/markdown

Ingest the documentation

make ingest

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Modular retrieval engine

Setup

Engine description

Pre-made engines

Usage

Engine selection

Commands

Tools

Streamlit UI

Qdrant

Example use

About

Uh oh!

Releases

Packages

Languages

pacwoodson/modular-retriever

Folders and files

Latest commit

History

Repository files navigation

Modular retrieval engine

Setup

Engine description

Pre-made engines

Usage

Engine selection

Commands

Tools

Streamlit UI

Qdrant

Example use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages