A comprehensive RAG (Retrieval-Augmented Generation) evaluation system with real-time metrics, document chat, and performance analysis using Google Gemini API and DeepEval.
- 🤖 Google Gemini 2.0 Flash-Lite: Latest Gemini model for fast, accurate responses
- 📊 Real-time Evaluation: DeepEval metrics with comprehensive analysis
- 📄 Document Processing: Upload and chat with PDF documents
- 🧭 Multi-page Navigation: Separate sections for chat, logs, and settings
- 📈 Performance Analytics: Track and compare evaluation metrics over time
- 🎯 Top-K Optimization: Find optimal retrieval parameters
- 💬 Chat Memory: Maintains conversation history and context
- 📚 Source References: Shows document chunks used for answers
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
streamlit run app.py
-
Get your Gemini API key:
- Go to Google AI Studio
- Create a new API key
- Enter it in the sidebar of the app
-
Start using the system:
- Navigate between Chat, Evaluation Logs, and Settings
- Upload PDF documents for analysis
- Enable real-time evaluation to track performance
- Ask questions and view comprehensive metrics
- Interactive chat interface with document Q&A
- Real-time evaluation metrics (Answer Relevancy, Faithfulness, etc.)
- Source attribution and chunk analysis
- Configurable Top-K retrieval settings
- Comprehensive performance analytics
- Historical evaluation data with trends
- Top-K performance comparison
- CSV export functionality
- Performance recommendations
- System configuration overview
- Data management (clear history, logs, documents)
- API status monitoring
- System information and statistics
- Document Processing: PDFs are processed into text chunks with embeddings
- Smart Retrieval: Finds relevant chunks using semantic similarity
- Response Generation: Gemini generates context-aware responses
- Real-time Evaluation: DeepEval metrics assess response quality
- Performance Tracking: Logs and analyzes evaluation results over time
- Answer Relevancy: Measures how relevant the response is to the question
- Faithfulness: Checks if the response aligns with the provided context
- Contextual Relevancy: Evaluates context relevance to the question
- Contextual Recall: Measures context completeness (auto-generated expected output)
- API Key: Enter your Gemini API key in the sidebar
- Document Upload: Upload PDFs for analysis
- Evaluation Settings: Enable/disable metrics and set thresholds
- Top-K Settings: Adjust number of retrieved chunks (1-20)
- LLM: Google Gemini 2.0 Flash-Lite
- Evaluation: DeepEval framework
- Embeddings: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions)
- Vector Database: Qdrant (in-memory)
- Framework: Streamlit with custom CSS
- Document Processing: PyPDF2
├── app.py # Main Streamlit application
├── document_processor.py # PDF processing and text extraction
├── embedding_generator.py # Embedding generation utilities
├── rag_evaluator.py # DeepEval integration and metrics
├── rate_limiter.py # API rate limiting
├── vector_database.py # Qdrant vector database operations
├── requirements.txt # Python dependencies
└── README.md # This file
-
API Overload (503 errors): Gemini API is experiencing high load
- Wait 2-3 minutes and try again
- Use fewer evaluation metrics
- Try during off-peak hours
-
Invalid API Key (400 errors): Check your Gemini API key
- Verify the key is correct in the sidebar
- Ensure the key has proper permissions
-
Rate Limiting (429 errors): Too many requests
- Built-in rate limiting should prevent this
- Wait before making more requests
- Start with fewer evaluation metrics to test the system
- Use Top-K values between 3-7 for optimal performance
- Upload smaller documents initially to test functionality
- Monitor the Evaluation Logs page for performance insights
- Auto-generated Expected Output: For contextual recall evaluation
- Performance Comparison: Compare different Top-K values
- Export Functionality: Download evaluation logs as CSV
- 3D UI Effects: Modern, professional interface design
- Dual Color System: Light navigation, dark content for optimal UX#