A production-style Retrieval-Augmented Generation (RAG) system that allows users to upload private documents and ask natural language questions over them using a local LLM (Ollama).
This project demonstrates end-to-end GenAI system design, including retrieval, grounding, evaluation, and deployment readiness.
- 📄 Multi-document PDF ingestion
- 🔍 Hybrid Retrieval (BM25 + FAISS)
- 🧠 Conversation-aware RAG (multi-turn memory)
- 🤖 Local LLM inference using Ollama
- 📊 RAG evaluation using RAGAS
- 🌐 FastAPI backend + Streamlit UI
- 📦 Fully Dockerized (reproducible setup)
| Category | Tools |
|---|---|
| Language | Python 3.11 |
| Backend | FastAPI, Uvicorn |
| Frontend | Streamlit |
| LLM | Ollama (Llama3 / Mistral) |
| Embeddings | Sentence-Transformers |
| Vector DB | FAISS |
| Retrieval | BM25 + Dense Retrieval |
| Evaluation | RAGAS |
| Deployment | Docker, Docker Compose |
flowchart LR
User --> UI["Streamlit UI"]
UI --> API["FastAPI Backend"]
API --> Retriever["Hybrid Retriever"]
Retriever --> BM25["BM25 Search"]
BM25 --> Docs["PDF Documents"]
Retriever --> FAISS["FAISS Vector Store"]
API --> LLM["Local LLM - Ollama"]
LLM --> API
API --> UI
rag-knowledge-assistant/
│
├── app/
│ ├── api.py # FastAPI backend
│ ├── ui.py # Streamlit UI
│ ├── document_loader.py # PDF loading & chunking
│ ├── evaluate_rag.py # RAGAS evaluation
│
├── data/
│ └── documents/
│
├── vector_store/
│
├── Dockerfile.backend
├── Dockerfile.ui
├── docker-compose.yml
├── requirements.txt
└── README.md
- Docker & Docker Compose
- Ollama installed and running
ollama run llama3docker compose up --build- UI → http://localhost:8501
- API Docs → http://localhost:8000/docs
Metrics used:
- Faithfulness
- Answer Relevancy
- Context Precision
- Context Recall
Run:
python app/evaluate_rag.py- Hybrid retrieval improves recall and precision
- Backend-managed conversation memory
- Fully local inference (privacy + cost control)
- Reproducible environment via Docker
Tonumay Bhattacharya
Data Scientist | GenAI | NLP | LLM Systems
This project focuses on system design, retrieval quality, and evaluation, not prompt-only demos.