Skip to content

a-coding-Kat/RAG_pipeline

Repository files navigation

MC1: Retrieval-Augmented Generation (RAG)

Overview

This project builds a RAG pipeline using:

  • BGE-M3 / BM42 for embeddings
  • Qdrant as the vector store
  • Ollama for local LLM generation
  • LangChain for orchestration
  • RAGAS for evaluation

Setup Instructions

1. Get the Dataset

Download from Kaggle and place the two .csv files into data/.

2. Set Up Ollama

Install from Ollama.

ollama pull gemma3:4b
ollama pull gemma3:1b
ollama pull bge-m3

3. Run Qdrant in Docker

Follow the Quickstart to run Qdrant locally.

Start the Qdrant server in Docker before running any Python code that connects to it.

mkdir data\qdrant_storage
docker run -d --name qdrant \
    -p 6333:6333 \
    -p 6334:6334 \
    -v "$(pwd)/data/qdrant_storage:/qdrant/storage/qdrant/storage" \
    qdrant/qdrant

4. Install Dependencies

conda env create -f environment.yml
conda activate npr

5. (Optional) Download preprocessed files

All chunks, embeddings, the qdrant storage, retrievals, augmented generations and evaluations Files are available at:

fhnw365.sharepoint

6. Run the Project

Run the notebooks 01, 02, 03, 04 ... one after the other manually.

The Pipeline

The RAG pipeline consists of two main phases: indexing and retrieval/generation.

Indexing Phase:

  • We preprocess a CSV file and split the 'content' column into text chunks using various strategies: sentence-level, paragraph-level, overlapping, semantic (with similarity threshold), and article-wise.
  • Chunks are embedded using dense (BGE-M3) and/or sparse (BM42) models.
  • The resulting embeddings are stored in a Qdrant vector database, supporting dense, sparse, or hybrid search.

Retrieval & Generation Phase:

  1. A user submits a query.
  2. It’s embedded using the coresponding model (embedding model is the same for indexing and retrieval).
  3. A similarity search retrieves the top-k matching chunks from Qdrant.
  4. Optional reranking improves result relevance.
  5. The top results are sent to an LLM, which generates the final answer.

The system allows experimentation with chunking methods, embedding types, and hybrid retrieval approaches.

Flowchart1

Flowchart1 Flowchart2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •