🧠 Mini Retrieval-Augmented Generation (Mini-RAG) Prototype

A hands-on project to build a simple RAG system using LangChain, ChromaDB, and Google Gemini embeddings. Designed for learning and demo purposes, no paid OpenAI API needed! 🚀

📋 Overview

This repo shows how to:

Load and split PDF documents into chunks 📄➡️📚
Generate text embeddings using Google Gemini (or OpenAI if available) 🧩
Store and query embeddings with ChromaDB (vector DB) 💾
Build a lightweight Retrieval-Augmented Generation pipeline for search and question answering 🔍🤖
Use LangChain as an orchestrator for embeddings and retrieval pipelines ⚙️

⚙️ Features

✅ PDF ingestion with metadata tracking
✅ Text splitting with overlap for context preservation
✅ Embedding generation via Gemini API
✅ Persistent vector store with ChromaDB
✅ Query interface with top-k retrieval
✅ GitHub Actions for CI testing (mocked embedding generation) 🧪

🚀 Getting Started

Prerequisites

Python 3.10+
Google Gemini API key (set GOOGLE_API_KEY in your .env)
(Optional) OpenAI API key if you want to switch embeddings provider

Installation

git clone https://github.com/yourusername/mini-rag.git
cd mini-rag
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows
pip install -r requirements.txt

Usage

Place your PDF files in the data/pdfs folder.
Add your API key to .env file:

GOOGLE_API_KEY=your_google_gemini_api_key_here

Run ingestion to build vector store:

python src/ingest.py

Query your RAG system (add your own query interface or notebook).

🧪 Testing

Run tests locally with:

pytest

GitHub Actions automatically run tests on push and pull requests.

💡 Notes

Gemini embedding API usage is currently limited by quota, so be mindful of your request volume.
Embeddings are 768-dimensional vectors by default.
This is a learning/demo project, not production ready.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
data/pdfs		data/pdfs
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Mini Retrieval-Augmented Generation (Mini-RAG) Prototype

📋 Overview

⚙️ Features

🚀 Getting Started

Prerequisites

Installation

Usage

🧪 Testing

💡 Notes

About

Uh oh!

Releases

Packages

Languages

m1chele11/mini-rag

Folders and files

Latest commit

History

Repository files navigation

🧠 Mini Retrieval-Augmented Generation (Mini-RAG) Prototype

📋 Overview

⚙️ Features

🚀 Getting Started

Prerequisites

Installation

Usage

🧪 Testing

💡 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages