A Retrieval-Augmented Generation (RAG) based Q&A system that provides accurate answers by combining document retrieval with state-of-the-art language models. This system is designed to efficiently process and answer questions about company documentation, FAQs, and technical specifications.
-
📚 Document Processing
- Automatic document chunking and indexing
- Smart content categorization
- Support for multiple document formats (.txt files)
-
🔍 Vector-based Search
- FAISS similarity search
- Efficient document retrieval
- Context-aware responses
-
🤖 AI-Powered Question Answering
- Uses Hugging Face's T5 model for natural language understanding
- Context-aware response generation
- Handles complex queries with proper context
-
🌐 Web Interface
- Clean and intuitive Streamlit interface
- Real-time response generation
- Debug information in sidebar
- Python 3.8 or higher
- Windows/Linux/MacOS
- 4GB RAM minimum (8GB recommended)
- Internet connection (for initial model download)
-
Clone the repository:
git clone <repository-url> cd RAG AGENT
-
Create and activate a virtual environment:
python -m venv venv # Windows .\venv\Scripts\activate # Linux/MacOS source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
RAG AGENT/
├── data/ # Document storage
│ └── company_docs/ # Company documentation
│ ├── api_documentation.txt
│ ├── company_faq.txt
│ └── product_specs.txt
├── src/ # Source code
│ ├── agent.py # QA agent implementation
│ ├── app.py # Streamlit web interface
│ ├── document_processor.py # Document processing
│ └── vector_store.py # Vector store operations
├── tests/ # Test files
│ ├── test_agent.py
│ ├── test_document_processor.py
│ └── test_vector_store.py
├── requirements.txt # Project dependencies
└── run_tests.py # Test runner script
-
Make sure you're in the virtual environment:
# Windows .\venv\Scripts\activate # Linux/MacOS source venv/bin/activate
-
Start the Streamlit application:
streamlit run src/app.py
-
Open your browser and navigate to
http://localhost:8501
Run the test suite to verify everything is working correctly:
python run_tests.pyAll tests should pass with output similar to:
test_load_documents ... ok
test_process_documents ... ok
test_create_vector_store ... ok
test_similarity_search ... ok
...
Try these example questions to test the system:
- "What industries does the company serve?"
- "What are the key features of AI Assistant Pro?"
- "What is the company's mission?"
- "What are the technical requirements for the product?"
- Check the sidebar in the Streamlit interface for:
- Number of documents loaded
- Document processing statistics
- Force retrain option
-
If you get model loading errors:
- Ensure you have a stable internet connection
- Try forcing a retrain using the sidebar button
-
If documents aren't being found:
- Check that your documents are in the correct format
- Verify files are in the
data/company_docs/directory - Use the Force Retrain button in the sidebar
-
Memory issues:
- Ensure you have at least 4GB of free RAM
- Close other memory-intensive applications
- Document Processing: Uses RecursiveCharacterTextSplitter with optimized chunk sizes
- Embeddings: HuggingFace sentence-transformers (all-MiniLM-L6-v2)
- Vector Store: FAISS for efficient similarity search
- Language Model: google/flan-t5-small for optimal performance
- Frontend: Streamlit with real-time processing
This project is open source and available under the MIT License.
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
- HuggingFace for providing the transformer models
- Facebook Research for FAISS
- Streamlit team for the web framework
For any questions or issues, please open an issue in the repository.



