Skip to content

tonumayworkspace-creator/rag-knowledge-assistant

Repository files navigation

🧠 RAG Knowledge Assistant (LLM + NLP)

Python FastAPI Streamlit Docker RAG Evaluation Status

A production-style Retrieval-Augmented Generation (RAG) system that allows users to upload private documents and ask natural language questions over them using a local LLM (Ollama).

This project demonstrates end-to-end GenAI system design, including retrieval, grounding, evaluation, and deployment readiness.


🚀 Features

  • 📄 Multi-document PDF ingestion
  • 🔍 Hybrid Retrieval (BM25 + FAISS)
  • 🧠 Conversation-aware RAG (multi-turn memory)
  • 🤖 Local LLM inference using Ollama
  • 📊 RAG evaluation using RAGAS
  • 🌐 FastAPI backend + Streamlit UI
  • 📦 Fully Dockerized (reproducible setup)

🧰 Tools & Technologies

Category Tools
Language Python 3.11
Backend FastAPI, Uvicorn
Frontend Streamlit
LLM Ollama (Llama3 / Mistral)
Embeddings Sentence-Transformers
Vector DB FAISS
Retrieval BM25 + Dense Retrieval
Evaluation RAGAS
Deployment Docker, Docker Compose

🧱 System Architecture

flowchart LR
    User --> UI["Streamlit UI"]
    UI --> API["FastAPI Backend"]
    API --> Retriever["Hybrid Retriever"]
    Retriever --> BM25["BM25 Search"]
    BM25 --> Docs["PDF Documents"]
    Retriever --> FAISS["FAISS Vector Store"]
    API --> LLM["Local LLM - Ollama"]
    LLM --> API
    API --> UI
Loading

📂 Project Structure

rag-knowledge-assistant/
│
├── app/
│   ├── api.py              # FastAPI backend
│   ├── ui.py               # Streamlit UI
│   ├── document_loader.py # PDF loading & chunking
│   ├── evaluate_rag.py     # RAGAS evaluation
│
├── data/
│   └── documents/
│
├── vector_store/
│
├── Dockerfile.backend
├── Dockerfile.ui
├── docker-compose.yml
├── requirements.txt
└── README.md

▶️ Run with Docker (Recommended)

Prerequisites

  • Docker & Docker Compose
  • Ollama installed and running
ollama run llama3

Start the system

docker compose up --build

Access


🧪 RAG Evaluation (RAGAS)

Metrics used:

  • Faithfulness
  • Answer Relevancy
  • Context Precision
  • Context Recall

Run:

python app/evaluate_rag.py

🧠 Design Highlights

  • Hybrid retrieval improves recall and precision
  • Backend-managed conversation memory
  • Fully local inference (privacy + cost control)
  • Reproducible environment via Docker

👤 Author

Tonumay Bhattacharya
Data Scientist | GenAI | NLP | LLM Systems


This project focuses on system design, retrieval quality, and evaluation, not prompt-only demos.

About

Production-style RAG system with FastAPI, Streamlit, Ollama, and RAGAS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages