GuPT is the name of the project developed by a student group for the course Machine Learning for Natural Language Processing (DIT247). The system leverages extracted information from Gothenburg University's (GU) bachelor's and master's courses (~590) and programs (~90), including relevant details from their websites and syllabus PDFs. This data is used as input to GuPT, which then employs a Retrieval-Augmented Generation (RAG) approach to respond to user queries.
GuPT's RAG model is built using LangChain, OpenAI embeddings, and gpt-5-mini. By utilizing multi-querying and logic routing, GuPT can handle ambiguous questions and provide both specific and general answers regarding GU courses and programs. The goal is to offer a tool that efficiently provides information on entry requirements, learning objectives, and assessment methods, thereby reducing confusion and administrative workload.
Access our interactive demo and start asking questions about GU courses and programs.
π Launch GuPT
- Features
- Getting Started
- Installation
- Usage
- Data Collection
- Architecture
- Evaluation
- Technologies Used
- Video Presentation
- Natural Language Querying: Ask questions about GU courses and programs in plain English.
- Contextual RAG System: Retrieves relevant information from a local database of course and program details.
- Multi-Querying and Logic Routing: Handles ambiguous queries and routes them through various queries to get precise answers.
- Scalable: Built to handle a large volume of course and program data.
- Efficient Retrieval: Reduces time spent searching for course or program information manually.
These instructions will help you set up a local copy of GuPT for development and testing purposes.
- Python 3.8+: Ensure you have Python installed.
- pip: Python package manager.
- OpenAI API Key: Required for embedding and text generation. Obtain one from OpenAI's website.
GuPT/
βββ π§ Core Modules
β βββ src/
β βββ main.py # Main entry point
β βββ config.py # Configuration and constants
β βββ models.py # Pydantic models and data classes
β βββ rag_service.py # Core RAG service (LCEL)
β βββ document_processor.py # Document loading and processing
β βββ chat_logger.py # Enhanced chat logging
β βββ interface.py # Gradio interface
β
βββ π Evaluation
β βββ evaluation/
β βββ calculators.py # ROUGE, BERT, semantic similarity
β βββ eval_models.py # Data models and types
β βββ evaluator.py # Main evaluation orchestrator
β βββ output.py # Result reporting and file management
β βββ settings.py # Default, fast, comprehensive configs
β βββ test_loader.py # Test case loading and filtering
β βββ run_evaluation.py # Main evaluation runner
β
βββ ποΈ Data
β βββ data/
β β βββ chroma/ # Vector database
β β βββ courses/ # Course documents
β β βββ programs/ # Program documents
β β βββ evaluation/ # Evaluation results
β βββ scraper/ # Web scraping tools
β βββ utils/ # Utility scripts
β
βββ βοΈ Configuration
β βββ environment.yml # Python dependencies and environment
β βββ .env.example # Environment variables template
β βββ Makefile # Build and deployment commands
β βββ docker-compose.yml # Docker configuration
β βββ Dockerfile # Docker image
β
βββ π Documentation
βββ README.md # This file
βββ chat_history.json # Chat interaction logs
# Clone the Repository
git clone https://github.com/faerazo/GuPT.git
cd GuPT
# Setup environment file
cp .env.example .env
# Edit .env and add your API keys
# Build and run with Docker
make docker-build
make docker-run
# Access at http://localhost:7860# Clone the Repository
git clone https://github.com/faerazo/GuPT.git
cd GuPT
# Use Makefile for easy setup
make setup
# Activate conda environment
conda activate gupt
# Launch the application
python src/main.py# Clone the Repository
git clone https://github.com/faerazo/GuPT.git
cd GuPT
# Create conda environment
conda env create -f environment.yml
# Activate environment
conda activate gupt
# Configure environment variables
cp .env.example .env
# Edit .env and add your API keys
# Launch the application
python src/main.pyCreate a .env file in the project root with the following variables:
OPENAI_API_KEY=your_openai_api_key_here# Basic launch
python main.py
# With custom options
python main.py --port 8080 --share
python main.py --rebuild-db --debugOpen your browser and navigate to:
- Local: http://localhost:7860
- Network: http://0.0.0.0:7860 (for network access)
python src/main.py [OPTIONS]
Options:
--share Enable Gradio public sharing
--no-share Explicitly disable sharing (default)
--port PORT Port to run on (default: 7860)
--host HOST Host to bind to (default: 0.0.0.0)
--rebuild-db Force rebuild of vector database
--debug Enable debug mode with verbose output
--quiet Suppress non-essential output
--help Show help messageNote: Make sure to activate the conda environment before running:
conda activate gupt
python src/main.pyCourse Information:
- "What are the prerequisites for Applied Machine Learning DIT867?"
- "How is the Advanced Databases course assessed?"
- "Tell me about the learning outcomes for Computer Security"
Program Information:
- "What is the Applied Data Science Master's program about?"
- "List all programs in the School of Business, Economics and Law"
- "What are the admission requirements for the Software Engineering program?"
General Queries:
- "What Computer Science courses are available?"
- "Which programs include machine learning courses?"
- "Tell me about courses taught in English"
The project includes a Makefile for convenient setup and deployment:
# View all available commands
make help
# π³ Docker Commands (Recommended)
make docker-build # Build Docker image
make docker-run # Run with Docker Compose
make docker-stop # Stop Docker containers
make docker-clean # Clean Docker containers and images
make docker-logs # View application logs
# π¦ Conda Commands (Development)
make install # Create conda environment
make setup # Complete setup (environment + .env)
make clean # Remove conda environment
make test # Test the installation
make run # Run the applicationNote: Make sure you have created .env file with your API keys before running Docker commands.
# Using Makefile
make docker-build
make docker-runData from the GU courses and programs is crawled from the GU website and stored in the data folder. The process is summarized in the following diagram:
The architecture of GuPT is shown in the following diagram:
The evaluation system has been restructured for better organization and functionality.
Run comprehensive evaluation of the RAG system with the new modular structure:
# Activate conda environment first
conda activate gupt
# Full evaluation (all test types)
python evaluation/run_evaluation.py
# Fast evaluation with smaller subset
python evaluation/run_evaluation.py --subset 50 --config fast
# Comprehensive evaluation with better models
python evaluation/run_evaluation.py --config comprehensive
# Run specific test type only
python evaluation/run_evaluation.py --test-type course_info --subset 20
# Available test types: course_info, prerequisites, learning_outcomes, assessment
# Available configs: default, fast, comprehensiveOutput Files:
eval_[timestamp].md- Human-readable summaryevaluation_results_[timestamp].jsonl- Detailed per-test resultsaggregated_metrics_[timestamp].json- Overall metrics


