A comprehensive guide to the LectureHub Chatbot, a Streamlit-based RAG (Retrieval-Augmented Generation) chatbot for educational content.
- Overview
- Architecture
- Installation
- Configuration
- Usage
- Development
- Testing
- Deployment
- Troubleshooting
The LectureHub Chatbot is a sophisticated RAG application designed to help students and educators interact with educational content. It combines:
- 🔍 Retrieval-Augmented Generation (RAG) for accurate, context-aware responses
- 🗄️ PostgreSQL with pgvector for efficient vector storage and similarity search
- 🧠 Google Gemini LLM for natural language processing (with mock fallback)
- 🌐 Streamlit for an intuitive web interface
- 🏗️ Modular architecture for maintainability and extensibility
- 📄 Multi-format Document Support: PDF, Markdown, and Python code
- 🎯 Smart Question Handling: Accepts all questions without strict relevance filtering
- ⚡ Vector Similarity Search: Fast and accurate document retrieval
- 🐳 Docker Integration: Easy setup with containerized database
- 🔧 Modular Design: Clean, maintainable codebase
- 🔄 Mock LLM Support: Works without API key for testing
lecturehub-chatbot/
├── src/ # Main source code
│ ├── core/ # Core application components
│ │ ├── config.py # Configuration constants
│ │ ├── main.py # Main entry point
│ │ ├── chatbot_logic.py # Core chatbot logic
│ │ └── rag_chatbot.py # Main orchestrator
│ ├── database/ # Database operations
│ │ ├── database.py # PostgreSQL and pgvector
│ │ └── vectorstore.py # Vector store management
│ ├── llm/ # LLM and language processing
│ │ ├── llm_chain.py # LLM chain management
│ │ ├── mock_llm_chain.py # Mock LLM for testing
│ │ └── keyword_extractor.py # Keyword extraction
│ ├── ui/ # User interface
│ │ ├── chat_manager.py # Chat management
│ │ └── ui_components.py # UI components
│ └── utils/ # Utilities
│ └── document_loader.py # Document processing
├── tests/ # Test files
├── docs/ # Documentation
├── config/ # Configuration files
├── docker/ # Docker setup
└── data/ # Data files
The main orchestrator that coordinates all components:
- 🎮 Manages application lifecycle
- 🔗 Coordinates database, LLM, and UI components
- 🛡️ Handles error scenarios gracefully
- 🗄️ DatabaseManager: PostgreSQL connection and pgvector operations
- 🔍 VectorStoreManager: Vector store operations and embedding management
- 🧠 LLMChainBuilder: Creates and manages QA chains
- 🔑 KeywordExtractor: Extracts keywords for relevance checking
- 🎯 MockLLMChainBuilder: Provides mock responses for testing
- ✅ SmartRelevanceChecker: Determines if questions are relevant (currently disabled)
- 💬 ChatManager: Manages chat history and interactions
- ⚙️ SidebarManager: Handles configuration UI
- 🖥️ MainUIManager: Manages main UI components
- 📄 DocumentLoader: Loads and processes documents
- ✂️ DocumentProcessor: Handles text chunking and metadata
- Python 3.8 or higher
- Docker and Docker Compose (for database)
- Google Gemini API key (optional - mock LLM available)
-
📥 Clone the repository:
git clone <repository-url> cd lecturehub-chatbot
-
📦 Install Python dependencies:
pip install -r requirements.txt
-
🐳 Start the database:
./docker/docker-setup.sh start
-
⚙️ Configure environment (optional):
cp env.template .env # Edit .env with your GEMINI_API_KEY (optional) -
🧪 Test the setup:
chmod +x ./docker/docker-setup.sh ./docker/docker-setup.sh test
The application uses environment variables for configuration. Key variables include:
# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=embedding
DB_USER=root
DB_PASSWORD=root_password
# LLM Configuration (Optional)
GEMINI_API_KEY=your_api_key_here
GEMINI_MODEL=gemini-2.5-flash-lite
# Application Configuration
PROBLEM_ID=ma_de_001config/env.template: Template for environment variablesconfig/example.env: Example configurationdocker/docker.env: Docker-specific configuration
The application supports two database configuration methods:
-
🔧 Individual Parameters:
db = DatabaseManager( host="localhost", port="5432", database="embedding", user="root", password="root_password" )
-
🔗 Connection String:
db = DatabaseManager( connection_string="postgresql+psycopg://user:pass@host:5432/db" )
# Development mode
python main.py
# Production mode
streamlit run main.py- 🚀 Start the application and navigate to the web interface
- ⚙️ Configure settings in the sidebar:
- Database connection parameters
- Google Gemini API key (optional)
- Application settings
- 📥 Ingest documents by clicking "Ingest dữ liệu"
- ❓ Ask questions about your educational content
Place these files in the data/lectures/mmceasar2/ directory:
mmceasar2.pdf- Problem statementmmceasar2.md- Lecture contentmmceasar2.py- Solution code
The chatbot provides:
- ⚡ Real-time responses based on your documents
- 📚 Source citations showing which documents were used
- 🎯 Flexible question handling - accepts all questions
- 💾 Chat history for continued conversations
-
📦 Install in development mode:
pip install -e . -
🔗 Set up pre-commit hooks:
pre-commit install
Extend the DocumentLoader class in src/utils/document_loader.py:
def _load_new_format(self, file_path: str) -> List[Any]:
# Implementation for new document type
passAdd components to src/ui/ui_components.py:
class NewUIManager:
def render_new_component(self):
# Implementation
passExtend LLMChainBuilder in src/llm/llm_chain.py:
def build_new_chain(self, retriever) -> NewChain:
# Implementation for new chain type
pass- Follow PEP 8 guidelines
- Use type hints for all function parameters and return values
- Write docstrings for all public functions and classes
- Keep functions small and focused
# Run all tests
python -m pytest tests/
# Run specific test file
python tests/test_components.py
# Run with coverage
python -m pytest tests/ --cov=srctests/test_components.py: Tests for individual componentstests/test_database_config.py: Database configuration teststests/test_docker_db.py: Docker database integration teststests/test_relevance.py: Relevance checker tests
def test_new_feature():
"""Test description."""
# Arrange
component = Component()
# Act
result = component.method()
# Assert
assert result == expected_value-
🔨 Build the application:
docker build -t lecturehub-chatbot . -
🚀 Run with Docker Compose:
docker-compose -f docker/docker-compose.yml up -d
- 🔐 Environment Variables: Use production environment variables
- 🗄️ Database: Use production PostgreSQL instance
- 🛡️ Security: Implement proper authentication and authorization
- 📊 Monitoring: Add logging and monitoring
- ⚖️ Scaling: Consider load balancing for multiple users
# Check database status
./docker/docker-setup.sh status
# View database logs
./docker/docker-setup.sh logs
# Restart database
./docker/docker-setup.sh restart- Verify
GEMINI_API_KEYis set correctly (optional) - Check API quota and limits
- Ensure network connectivity
- 💡 Tip: Application works with mock LLM without API key
- Verify required files exist in
data/lectures/mmceasar2/ - Check file permissions
- Ensure file formats are supported
- Verify Python path includes
src/directory - Check that all dependencies are installed
- Ensure
__init__.pyfiles exist in all packages
Enable debug logging by setting:
export LOG_LEVEL=DEBUG- Check the logs for error messages
- Verify configuration settings
- Test individual components
- Consult the project documentation
- 🔄 Migrated from psycopg2 to psycopg: Updated all database connections
- 🎯 Disabled strict relevance checking: Chatbot now accepts all questions
- 🤖 Added mock LLM support: Works without API key for testing
- 🔧 Improved error handling: Better fallback mechanisms
- 📝 Updated documentation: Comprehensive guides and examples
- ✅ No more "Xin lỗi, tôi chỉ hỗ trợ hỏi về bài giảng này thôi." responses
- ✅ Works without Google API key
- ✅ Better database compatibility
- ✅ Enhanced user experience
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Update documentation
- Submit a pull request
- All code must pass tests
- Documentation must be updated
- Code style must follow project guidelines
- Security considerations must be addressed
This project is licensed under the MIT License. See the LICENSE file for details.
For support and questions:
- 📖 Check the documentation
- 🔧 Review the troubleshooting section
- 🐛 Open an issue on GitHub
- 📧 Contact the development team
🎉 Happy coding with LectureHub Chatbot! 🚀