This project builds a RAG pipeline using:
- BGE-M3 / BM42 for embeddings
- Qdrant as the vector store
- Ollama for local LLM generation
- LangChain for orchestration
- RAGAS for evaluation
Download from Kaggle and place the two .csv files into data/.
Install from Ollama.
ollama pull gemma3:4b
ollama pull gemma3:1b
ollama pull bge-m3Follow the Quickstart to run Qdrant locally.
Start the Qdrant server in Docker before running any Python code that connects to it.
mkdir data\qdrant_storage
docker run -d --name qdrant \
-p 6333:6333 \
-p 6334:6334 \
-v "$(pwd)/data/qdrant_storage:/qdrant/storage/qdrant/storage" \
qdrant/qdrantconda env create -f environment.yml
conda activate nprAll chunks, embeddings, the qdrant storage, retrievals, augmented generations and evaluations Files are available at:
Run the notebooks 01, 02, 03, 04 ... one after the other manually.
The RAG pipeline consists of two main phases: indexing and retrieval/generation.
Indexing Phase:
- We preprocess a CSV file and split the 'content' column into text chunks using various strategies: sentence-level, paragraph-level, overlapping, semantic (with similarity threshold), and article-wise.
- Chunks are embedded using dense (BGE-M3) and/or sparse (BM42) models.
- The resulting embeddings are stored in a Qdrant vector database, supporting dense, sparse, or hybrid search.
Retrieval & Generation Phase:
- A user submits a query.
- It’s embedded using the coresponding model (embedding model is the same for indexing and retrieval).
- A similarity search retrieves the top-k matching chunks from Qdrant.
- Optional reranking improves result relevance.
- The top results are sent to an LLM, which generates the final answer.
The system allows experimentation with chunking methods, embedding types, and hybrid retrieval approaches.
![]() |
![]() |


