🤖 LectureHub Chatbot - Complete Documentation

A comprehensive guide to the LectureHub Chatbot, a Streamlit-based RAG (Retrieval-Augmented Generation) chatbot for educational content.

📋 Table of Contents

Overview
Architecture
Installation
Configuration
Usage
Development
Testing
Deployment
Troubleshooting

🎯 Overview

The LectureHub Chatbot is a sophisticated RAG application designed to help students and educators interact with educational content. It combines:

🔍 Retrieval-Augmented Generation (RAG) for accurate, context-aware responses
🗄️ PostgreSQL with pgvector for efficient vector storage and similarity search
🧠 Google Gemini LLM for natural language processing (with mock fallback)
🌐 Streamlit for an intuitive web interface
🏗️ Modular architecture for maintainability and extensibility

✨ Key Features

📄 Multi-format Document Support: PDF, Markdown, and Python code
🎯 Smart Question Handling: Accepts all questions without strict relevance filtering
⚡ Vector Similarity Search: Fast and accurate document retrieval
🐳 Docker Integration: Easy setup with containerized database
🔧 Modular Design: Clean, maintainable codebase
🔄 Mock LLM Support: Works without API key for testing

🏗️ Architecture

📁 Project Structure

lecturehub-chatbot/
├── src/                          # Main source code
│   ├── core/                     # Core application components
│   │   ├── config.py            # Configuration constants
│   │   ├── main.py              # Main entry point
│   │   ├── chatbot_logic.py     # Core chatbot logic
│   │   └── rag_chatbot.py       # Main orchestrator
│   ├── database/                 # Database operations
│   │   ├── database.py          # PostgreSQL and pgvector
│   │   └── vectorstore.py       # Vector store management
│   ├── llm/                      # LLM and language processing
│   │   ├── llm_chain.py         # LLM chain management
│   │   ├── mock_llm_chain.py    # Mock LLM for testing
│   │   └── keyword_extractor.py # Keyword extraction
│   ├── ui/                       # User interface
│   │   ├── chat_manager.py      # Chat management
│   │   └── ui_components.py     # UI components
│   └── utils/                    # Utilities
│       └── document_loader.py   # Document processing
├── tests/                        # Test files
├── docs/                         # Documentation
├── config/                       # Configuration files
├── docker/                       # Docker setup
└── data/                         # Data files

🔧 Core Components

1. RAGChatbot (`src/core/rag_chatbot.py`)

The main orchestrator that coordinates all components:

🎮 Manages application lifecycle
🔗 Coordinates database, LLM, and UI components
🛡️ Handles error scenarios gracefully

2. Database Management (`src/database/`)

🗄️ DatabaseManager: PostgreSQL connection and pgvector operations
🔍 VectorStoreManager: Vector store operations and embedding management

3. LLM Processing (`src/llm/`)

🧠 LLMChainBuilder: Creates and manages QA chains
🔑 KeywordExtractor: Extracts keywords for relevance checking
🎯 MockLLMChainBuilder: Provides mock responses for testing
✅ SmartRelevanceChecker: Determines if questions are relevant (currently disabled)

4. User Interface (`src/ui/`)

💬 ChatManager: Manages chat history and interactions
⚙️ SidebarManager: Handles configuration UI
🖥️ MainUIManager: Manages main UI components

5. Utilities (`src/utils/`)

📄 DocumentLoader: Loads and processes documents
✂️ DocumentProcessor: Handles text chunking and metadata

🚀 Installation

📋 Prerequisites

Python 3.8 or higher
Docker and Docker Compose (for database)
Google Gemini API key (optional - mock LLM available)

📦 Step-by-Step Installation

📥 Clone the repository:

git clone <repository-url>
cd lecturehub-chatbot

📦 Install Python dependencies:
```
pip install -r requirements.txt
```
🐳 Start the database:
```
./docker/docker-setup.sh start
```

⚙️ Configure environment (optional):

cp env.template .env
# Edit .env with your GEMINI_API_KEY (optional)

🧪 Test the setup:

chmod +x ./docker/docker-setup.sh
./docker/docker-setup.sh test

⚙️ Configuration

🔧 Environment Variables

The application uses environment variables for configuration. Key variables include:

# Database Configuration
DB_HOST=localhost
DB_PORT=5432
DB_NAME=embedding
DB_USER=root
DB_PASSWORD=root_password

# LLM Configuration (Optional)
GEMINI_API_KEY=your_api_key_here
GEMINI_MODEL=gemini-2.5-flash-lite

# Application Configuration
PROBLEM_ID=ma_de_001

📁 Configuration Files

config/env.template: Template for environment variables
config/example.env: Example configuration
docker/docker.env: Docker-specific configuration

🗄️ Database Configuration

The application supports two database configuration methods:

🔧 Individual Parameters:

db = DatabaseManager(
    host="localhost",
    port="5432",
    database="embedding",
    user="root",
    password="root_password"
)

🔗 Connection String:

db = DatabaseManager(
    connection_string="postgresql+psycopg://user:pass@host:5432/db"
)

🎮 Usage

🚀 Running the Application

# Development mode
python main.py

# Production mode
streamlit run main.py

💬 Using the Chatbot

🚀 Start the application and navigate to the web interface
⚙️ Configure settings in the sidebar:
- Database connection parameters
- Google Gemini API key (optional)
- Application settings
📥 Ingest documents by clicking "Ingest dữ liệu"
❓ Ask questions about your educational content

📄 Document Requirements

Place these files in the data/lectures/mmceasar2/ directory:

mmceasar2.pdf - Problem statement
mmceasar2.md - Lecture content
mmceasar2.py - Solution code

💬 Chat Interface

The chatbot provides:

⚡ Real-time responses based on your documents
📚 Source citations showing which documents were used
🎯 Flexible question handling - accepts all questions
💾 Chat history for continued conversations

🛠️ Development

🔧 Project Setup

📦 Install in development mode:
```
pip install -e .
```
🔗 Set up pre-commit hooks:
```
pre-commit install
```

🆕 Adding New Features

1. New Document Types

Extend the DocumentLoader class in src/utils/document_loader.py:

def _load_new_format(self, file_path: str) -> List[Any]:
    # Implementation for new document type
    pass

2. New UI Components

Add components to src/ui/ui_components.py:

class NewUIManager:
    def render_new_component(self):
        # Implementation
        pass

3. New LLM Models

Extend LLMChainBuilder in src/llm/llm_chain.py:

def build_new_chain(self, retriever) -> NewChain:
    # Implementation for new chain type
    pass

📝 Code Style

Follow PEP 8 guidelines
Use type hints for all function parameters and return values
Write docstrings for all public functions and classes
Keep functions small and focused

🧪 Testing

🚀 Running Tests

# Run all tests
python -m pytest tests/

# Run specific test file
python tests/test_components.py

# Run with coverage
python -m pytest tests/ --cov=src

📁 Test Structure

tests/test_components.py: Tests for individual components
tests/test_database_config.py: Database configuration tests
tests/test_docker_db.py: Docker database integration tests
tests/test_relevance.py: Relevance checker tests

✍️ Writing Tests

def test_new_feature():
    """Test description."""
    # Arrange
    component = Component()
    
    # Act
    result = component.method()
    
    # Assert
    assert result == expected_value

🚀 Deployment

🐳 Docker Deployment

🔨 Build the application:
```
docker build -t lecturehub-chatbot .
```

🚀 Run with Docker Compose:

docker-compose -f docker/docker-compose.yml up -d

🏭 Production Considerations

🔐 Environment Variables: Use production environment variables
🗄️ Database: Use production PostgreSQL instance
🛡️ Security: Implement proper authentication and authorization
📊 Monitoring: Add logging and monitoring
⚖️ Scaling: Consider load balancing for multiple users

🔧 Troubleshooting

🚨 Common Issues

1. Database Connection Issues

# Check database status
./docker/docker-setup.sh status

# View database logs
./docker/docker-setup.sh logs

# Restart database
./docker/docker-setup.sh restart

2. LLM API Issues

Verify GEMINI_API_KEY is set correctly (optional)
Check API quota and limits
Ensure network connectivity
💡 Tip: Application works with mock LLM without API key

3. Document Loading Issues

Verify required files exist in data/lectures/mmceasar2/
Check file permissions
Ensure file formats are supported

4. Import Errors

Verify Python path includes src/ directory
Check that all dependencies are installed
Ensure __init__.py files exist in all packages

🐛 Debug Mode

Enable debug logging by setting:

export LOG_LEVEL=DEBUG

🆘 Getting Help

Check the logs for error messages
Verify configuration settings
Test individual components
Consult the project documentation

🔄 Recent Changes

🆕 Latest Updates

🔄 Migrated from psycopg2 to psycopg: Updated all database connections
🎯 Disabled strict relevance checking: Chatbot now accepts all questions
🤖 Added mock LLM support: Works without API key for testing
🔧 Improved error handling: Better fallback mechanisms
📝 Updated documentation: Comprehensive guides and examples

🎯 Key Improvements

✅ No more "Xin lỗi, tôi chỉ hỗ trợ hỏi về bài giảng này thôi." responses
✅ Works without Google API key
✅ Better database compatibility
✅ Enhanced user experience

🤝 Contributing

🔄 Development Workflow

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Update documentation
Submit a pull request

👀 Code Review Process

All code must pass tests
Documentation must be updated
Code style must follow project guidelines
Security considerations must be addressed

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🆘 Support

For support and questions:

📖 Check the documentation
🔧 Review the troubleshooting section
🐛 Open an issue on GitHub
📧 Contact the development team

🎉 Happy coding with LectureHub Chatbot! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docker		docker
docs		docs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
env.template		env.template
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

LectureHubTeam/lecturehub-chatbox

Folders and files

Latest commit

History

Repository files navigation

🤖 LectureHub Chatbot - Complete Documentation

📋 Table of Contents

🎯 Overview

✨ Key Features

🏗️ Architecture

📁 Project Structure

🔧 Core Components

1. RAGChatbot (src/core/rag_chatbot.py)

2. Database Management (src/database/)

3. LLM Processing (src/llm/)

4. User Interface (src/ui/)

5. Utilities (src/utils/)

🚀 Installation

📋 Prerequisites

📦 Step-by-Step Installation

⚙️ Configuration

🔧 Environment Variables

📁 Configuration Files

🗄️ Database Configuration

🎮 Usage

🚀 Running the Application

💬 Using the Chatbot

📄 Document Requirements

💬 Chat Interface

🛠️ Development

🔧 Project Setup

🆕 Adding New Features

1. New Document Types

2. New UI Components

3. New LLM Models

📝 Code Style

🧪 Testing

🚀 Running Tests

📁 Test Structure

✍️ Writing Tests

🚀 Deployment

🐳 Docker Deployment

🏭 Production Considerations

🔧 Troubleshooting

🚨 Common Issues

1. Database Connection Issues

2. LLM API Issues

3. Document Loading Issues

4. Import Errors

🐛 Debug Mode

🆘 Getting Help

🔄 Recent Changes

🆕 Latest Updates

🎯 Key Improvements

🤝 Contributing

🔄 Development Workflow

👀 Code Review Process

📄 License

🆘 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. RAGChatbot (`src/core/rag_chatbot.py`)

2. Database Management (`src/database/`)

3. LLM Processing (`src/llm/`)

4. User Interface (`src/ui/`)

5. Utilities (`src/utils/`)

Packages