Skip to content

A hands-on project to build a simple RAG system using LangChain, ChromaDB, and Google Gemini embeddings. Designed for learning and demo purposes.

Notifications You must be signed in to change notification settings

m1chele11/mini-rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Mini Retrieval-Augmented Generation (Mini-RAG) Prototype

A hands-on project to build a simple RAG system using LangChain, ChromaDB, and Google Gemini embeddings. Designed for learning and demo purposes, no paid OpenAI API needed! πŸš€


πŸ“‹ Overview

This repo shows how to:

  • Load and split PDF documents into chunks πŸ“„βž‘οΈπŸ“š
  • Generate text embeddings using Google Gemini (or OpenAI if available) 🧩
  • Store and query embeddings with ChromaDB (vector DB) πŸ’Ύ
  • Build a lightweight Retrieval-Augmented Generation pipeline for search and question answering πŸ”πŸ€–
  • Use LangChain as an orchestrator for embeddings and retrieval pipelines βš™οΈ

βš™οΈ Features

  • βœ… PDF ingestion with metadata tracking
  • βœ… Text splitting with overlap for context preservation
  • βœ… Embedding generation via Gemini API
  • βœ… Persistent vector store with ChromaDB
  • βœ… Query interface with top-k retrieval
  • βœ… GitHub Actions for CI testing (mocked embedding generation) πŸ§ͺ

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • Google Gemini API key (set GOOGLE_API_KEY in your .env)
  • (Optional) OpenAI API key if you want to switch embeddings provider

Installation

git clone https://github.com/yourusername/mini-rag.git
cd mini-rag
python -m venv venv
source venv/bin/activate  # macOS/Linux
# venv\Scripts\activate   # Windows
pip install -r requirements.txt

Usage

  1. Place your PDF files in the data/pdfs folder.
  2. Add your API key to .env file:
GOOGLE_API_KEY=your_google_gemini_api_key_here
  1. Run ingestion to build vector store:
python src/ingest.py
  1. Query your RAG system (add your own query interface or notebook).

πŸ§ͺ Testing

Run tests locally with:

pytest

GitHub Actions automatically run tests on push and pull requests.


πŸ’‘ Notes

  • Gemini embedding API usage is currently limited by quota, so be mindful of your request volume.
  • Embeddings are 768-dimensional vectors by default.
  • This is a learning/demo project, not production ready.

About

A hands-on project to build a simple RAG system using LangChain, ChromaDB, and Google Gemini embeddings. Designed for learning and demo purposes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages