A production-grade Retrieval-Augmented Generation (RAG) system that enables users to query PDFs, documents, and web content with high accuracy. The system implements semantic search, reranking, citation-based answers, and hallucination control to provide reliable, verifiable responses.
- Document Processing: Upload and index PDF documents and web content
- Semantic Search: Find relevant information using embeddings and vector similarity
- Reranking: Improve result relevance with cross-encoder models
- Citations: Get answers with source references for verification
- Hallucination Control: Responses grounded only in your documents
- Flexible LLM Support: Use OpenAI or open-source models (Llama, Mistral)
- Modern UI: Clean, responsive Next.js interface with Tailwind CSS
This is a monorepo containing:
- backend/: FastAPI server with RAG pipeline (Python)
- frontend/: Next.js web application (TypeScript + React)
- Navigate to backend directory:
cd backend- Create and activate virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- Configure environment:
copy .env.example .env
# Edit .env with your settings- Run the server:
uvicorn src.main:app --reloadBackend will be available at http://localhost:8000
- Navigate to frontend directory:
cd frontend- Install dependencies:
npm install- Configure environment:
copy .env.local.example .env.local
# Edit .env.local if needed- Run the development server:
npm run devFrontend will be available at http://localhost:3000
- FastAPI: Modern Python web framework
- LangChain: LLM application framework
- FAISS: Vector similarity search
- Sentence Transformers: Text embeddings
- PyPDF2/pdfplumber: PDF processing
- BeautifulSoup4: Web scraping
- pytest + hypothesis: Testing
- Next.js 14: React framework
- TypeScript: Type safety
- Tailwind CSS: Styling
- React Query: API state management
- Zustand: Global state
- React Dropzone: File uploads
.
├── backend/
│ ├── src/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI application
│ │ └── config.py # Configuration management
│ ├── tests/ # Backend tests
│ ├── data/ # FAISS index and metadata storage
│ ├── requirements.txt # Python dependencies
│ ├── config.yaml # Application configuration
│ └── .env.example # Environment variables template
│
├── frontend/
│ ├── app/
│ │ ├── layout.tsx # Root layout
│ │ ├── page.tsx # Home page
│ │ └── globals.css # Global styles
│ ├── components/ # React components
│ ├── package.json # Node dependencies
│ ├── tsconfig.json # TypeScript config
│ ├── tailwind.config.ts # Tailwind config
│ └── .env.local.example # Environment variables template
│
└── .kiro/
└── specs/ # Feature specifications
- Embedding model: Sentence transformer model for embeddings
- Chunking: Chunk size and overlap settings
- Retrieval: Top-k results and similarity threshold
- Reranking: Cross-encoder model and top-n selection
- LLM: Provider, temperature, and token limits
- Storage: Paths for FAISS index and metadata
Backend (.env):
OPENAI_API_KEY: OpenAI API key (optional, can be set via UI)ENCRYPTION_KEY: Key for encrypting stored API keysCORS_ORIGINS: Allowed frontend origins
Frontend (.env.local):
NEXT_PUBLIC_API_URL: Backend API URLNEXT_PUBLIC_APP_NAME: Application name
cd backend
pytest # Run all tests
pytest --cov=src tests/ # Run with coverage
pytest -v tests/ # Verbose outputcd frontend
npm test # Run all tests
npm run test:watch # Watch mode- API documentation available at http://localhost:8000/docs
- Health check endpoint: http://localhost:8000/health
- Hot reload enabled with
--reloadflag
- Hot reload enabled by default
- TypeScript type checking
- ESLint for code quality
POST /documents/upload- Upload PDF documentPOST /documents/url- Index web contentGET /documents- List all documentsGET /documents/{doc_id}- Get document detailsDELETE /documents/{doc_id}- Delete documentPOST /query- Ask a questionGET /health- Health checkPOST /settings/api-key- Save API keyGET /settings/api-key- Get current providerPOST /settings/test-connection- Test LLM connection
This is an educational project developed as part of college coursework to learn about:
- Retrieval-Augmented Generation (RAG) systems
- Vector databases and semantic search
- Modern web development with FastAPI and Next.js
- LLM integration and prompt engineering
- Production-grade software architecture
Note for Students: This code is shared for learning and reference purposes. If you're working on a similar assignment, please use this to understand concepts and approaches, but develop your own implementation. Direct copying violates academic integrity policies.
Note for Recruiters/Employers: This project demonstrates my understanding of AI/ML systems, full-stack development, and software engineering best practices.
This project follows a spec-driven development approach. See .kiro/specs/ for detailed requirements, design, and implementation tasks.
All code and documentation © 2026. All rights reserved.