🧠 RAG Knowledge Assistant (LLM + NLP)

A production-style Retrieval-Augmented Generation (RAG) system that allows users to upload private documents and ask natural language questions over them using a local LLM (Ollama).

This project demonstrates end-to-end GenAI system design, including retrieval, grounding, evaluation, and deployment readiness.

🚀 Features

📄 Multi-document PDF ingestion
🔍 Hybrid Retrieval (BM25 + FAISS)
🧠 Conversation-aware RAG (multi-turn memory)
🤖 Local LLM inference using Ollama
📊 RAG evaluation using RAGAS
🌐 FastAPI backend + Streamlit UI
📦 Fully Dockerized (reproducible setup)

🧰 Tools & Technologies

Category	Tools
Language	Python 3.11
Backend	FastAPI, Uvicorn
Frontend	Streamlit
LLM	Ollama (Llama3 / Mistral)
Embeddings	Sentence-Transformers
Vector DB	FAISS
Retrieval	BM25 + Dense Retrieval
Evaluation	RAGAS
Deployment	Docker, Docker Compose

🧱 System Architecture

flowchart LR
    User --> UI["Streamlit UI"]
    UI --> API["FastAPI Backend"]
    API --> Retriever["Hybrid Retriever"]
    Retriever --> BM25["BM25 Search"]
    BM25 --> Docs["PDF Documents"]
    Retriever --> FAISS["FAISS Vector Store"]
    API --> LLM["Local LLM - Ollama"]
    LLM --> API
    API --> UI

📂 Project Structure

rag-knowledge-assistant/
│
├── app/
│   ├── api.py              # FastAPI backend
│   ├── ui.py               # Streamlit UI
│   ├── document_loader.py # PDF loading & chunking
│   ├── evaluate_rag.py     # RAGAS evaluation
│
├── data/
│   └── documents/
│
├── vector_store/
│
├── Dockerfile.backend
├── Dockerfile.ui
├── docker-compose.yml
├── requirements.txt
└── README.md

▶️ Run with Docker (Recommended)

Prerequisites

Docker & Docker Compose
Ollama installed and running

ollama run llama3

Start the system

docker compose up --build

Access

UI → http://localhost:8501
API Docs → http://localhost:8000/docs

🧪 RAG Evaluation (RAGAS)

Metrics used:

Faithfulness
Answer Relevancy
Context Precision
Context Recall

Run:

python app/evaluate_rag.py

🧠 Design Highlights

Hybrid retrieval improves recall and precision
Backend-managed conversation memory
Fully local inference (privacy + cost control)
Reproducible environment via Docker

👤 Author

Tonumay Bhattacharya
Data Scientist | GenAI | NLP | LLM Systems

This project focuses on system design, retrieval quality, and evaluation, not prompt-only demos.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 RAG Knowledge Assistant (LLM + NLP)

🚀 Features

🧰 Tools & Technologies

🧱 System Architecture

📂 Project Structure

▶️ Run with Docker (Recommended)

Prerequisites

Start the system

Access

🧪 RAG Evaluation (RAGAS)

🧠 Design Highlights

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
data/documents		data/documents
.gitignore		.gitignore
Dockerfile.backend		Dockerfile.backend
Dockerfile.ui		Dockerfile.ui
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 RAG Knowledge Assistant (LLM + NLP)

🚀 Features

🧰 Tools & Technologies

🧱 System Architecture

📂 Project Structure

▶️ Run with Docker (Recommended)

Prerequisites

Start the system

Access

🧪 RAG Evaluation (RAGAS)

🧠 Design Highlights

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages