Skip to content
/ GuPT Public

GuPT is a RAG chatbot system providing accurate and quick answers about Gothenburg University's courses and programs to help students access academic information effortlessly.

Notifications You must be signed in to change notification settings

faerazo/GuPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

82 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– GuPT

Python LangChain OpenAI Gradio

GuPT is the name of the project developed by a student group for the course Machine Learning for Natural Language Processing (DIT247). The system leverages extracted information from Gothenburg University's (GU) bachelor's and master's courses (~590) and programs (~90), including relevant details from their websites and syllabus PDFs. This data is used as input to GuPT, which then employs a Retrieval-Augmented Generation (RAG) approach to respond to user queries.

GuPT's RAG model is built using LangChain, OpenAI embeddings, and gpt-5-mini. By utilizing multi-querying and logic routing, GuPT can handle ambiguous questions and provide both specific and general answers regarding GU courses and programs. The goal is to offer a tool that efficiently provides information on entry requirements, learning objectives, and assessment methods, thereby reducing confusion and administrative workload.


πŸš€ Try It Out

Hugging Face Spaces

Access our interactive demo and start asking questions about GU courses and programs.

πŸ‘‰ Launch GuPT


πŸ“‹ Table of Contents

  1. Features
  2. Getting Started
  3. Installation
  4. Usage
  5. Data Collection
  6. Architecture
  7. Evaluation
  8. Technologies Used
  9. Video Presentation

✨ Features

  • Natural Language Querying: Ask questions about GU courses and programs in plain English.
  • Contextual RAG System: Retrieves relevant information from a local database of course and program details.
  • Multi-Querying and Logic Routing: Handles ambiguous queries and routes them through various queries to get precise answers.
  • Scalable: Built to handle a large volume of course and program data.
  • Efficient Retrieval: Reduces time spent searching for course or program information manually.

πŸš€ Getting Started

These instructions will help you set up a local copy of GuPT for development and testing purposes.

Prerequisites

  • Python 3.8+: Ensure you have Python installed.
  • pip: Python package manager.
  • OpenAI API Key: Required for embedding and text generation. Obtain one from OpenAI's website.

πŸ“ Project Structure

GuPT/
β”œβ”€β”€ πŸ”§ Core Modules
β”‚   └── src/
β”‚       β”œβ”€β”€ main.py              # Main entry point
β”‚       β”œβ”€β”€ config.py            # Configuration and constants
β”‚       β”œβ”€β”€ models.py            # Pydantic models and data classes
β”‚       β”œβ”€β”€ rag_service.py       # Core RAG service (LCEL)
β”‚       β”œβ”€β”€ document_processor.py # Document loading and processing
β”‚       β”œβ”€β”€ chat_logger.py       # Enhanced chat logging
β”‚       └── interface.py         # Gradio interface
β”‚
β”œβ”€β”€ πŸ“Š Evaluation 
β”‚   └── evaluation/
β”‚       β”œβ”€β”€ calculators.py       # ROUGE, BERT, semantic similarity
β”‚       β”œβ”€β”€ eval_models.py       # Data models and types
β”‚       β”œβ”€β”€ evaluator.py         # Main evaluation orchestrator
β”‚       β”œβ”€β”€ output.py            # Result reporting and file management
β”‚       β”œβ”€β”€ settings.py          # Default, fast, comprehensive configs
β”‚       β”œβ”€β”€ test_loader.py       # Test case loading and filtering
β”‚       └── run_evaluation.py    # Main evaluation runner
β”‚
β”œβ”€β”€ πŸ—‚οΈ Data
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ chroma/          # Vector database
β”‚   β”‚   β”œβ”€β”€ courses/         # Course documents
β”‚   β”‚   β”œβ”€β”€ programs/        # Program documents
β”‚   β”‚   └── evaluation/      # Evaluation results
β”‚   β”œβ”€β”€ scraper/             # Web scraping tools
β”‚   └── utils/               # Utility scripts
β”‚
β”œβ”€β”€ βš™οΈ Configuration
β”‚   β”œβ”€β”€ environment.yml      # Python dependencies and environment
β”‚   β”œβ”€β”€ .env.example         # Environment variables template
β”‚   β”œβ”€β”€ Makefile             # Build and deployment commands
β”‚   β”œβ”€β”€ docker-compose.yml   # Docker configuration
β”‚   └── Dockerfile          # Docker image
β”‚
└── πŸ“š Documentation
    β”œβ”€β”€ README.md           # This file
    └── chat_history.json   # Chat interaction logs

πŸ“¦ Installation

🐳 Option 1: Docker (Recommended)

# Clone the Repository
git clone https://github.com/faerazo/GuPT.git
cd GuPT

# Setup environment file
cp .env.example .env
# Edit .env and add your API keys

# Build and run with Docker
make docker-build
make docker-run

# Access at http://localhost:7860

🐍 Option 2: Conda Environment

# Clone the Repository
git clone https://github.com/faerazo/GuPT.git
cd GuPT

# Use Makefile for easy setup
make setup

# Activate conda environment
conda activate gupt

# Launch the application
python src/main.py

πŸ”§ Option 3: Manual Conda Setup

# Clone the Repository
git clone https://github.com/faerazo/GuPT.git
cd GuPT

# Create conda environment
conda env create -f environment.yml

# Activate environment
conda activate gupt

# Configure environment variables
cp .env.example .env
# Edit .env and add your API keys

# Launch the application
python src/main.py

πŸ”‘ Required Environment Variables

Create a .env file in the project root with the following variables:

OPENAI_API_KEY=your_openai_api_key_here

πŸš€ Launch Options

# Basic launch
python main.py

# With custom options
python main.py --port 8080 --share
python main.py --rebuild-db --debug

🌐 Access the Interface

Open your browser and navigate to:


🎯 Usage

Command Line Options

python src/main.py [OPTIONS]

Options:
  --share              Enable Gradio public sharing
  --no-share           Explicitly disable sharing (default)
  --port PORT          Port to run on (default: 7860)
  --host HOST          Host to bind to (default: 0.0.0.0)
  --rebuild-db         Force rebuild of vector database
  --debug              Enable debug mode with verbose output
  --quiet              Suppress non-essential output
  --help               Show help message

Note: Make sure to activate the conda environment before running:

conda activate gupt
python src/main.py

Example Queries

Course Information:

  • "What are the prerequisites for Applied Machine Learning DIT867?"
  • "How is the Advanced Databases course assessed?"
  • "Tell me about the learning outcomes for Computer Security"

Program Information:

  • "What is the Applied Data Science Master's program about?"
  • "List all programs in the School of Business, Economics and Law"
  • "What are the admission requirements for the Software Engineering program?"

General Queries:

  • "What Computer Science courses are available?"
  • "Which programs include machine learning courses?"
  • "Tell me about courses taught in English"

πŸ› οΈ Makefile Commands

The project includes a Makefile for convenient setup and deployment:

# View all available commands
make help

# 🐳 Docker Commands (Recommended)
make docker-build   # Build Docker image  
make docker-run     # Run with Docker Compose
make docker-stop    # Stop Docker containers
make docker-clean   # Clean Docker containers and images
make docker-logs    # View application logs

# πŸ“¦ Conda Commands (Development)
make install        # Create conda environment
make setup          # Complete setup (environment + .env)
make clean          # Remove conda environment
make test           # Test the installation
make run            # Run the application

🐳 Docker Deployment

Note: Make sure you have created .env file with your API keys before running Docker commands.

# Using Makefile
make docker-build
make docker-run

πŸ“Š Data Collection

Data from the GU courses and programs is crawled from the GU website and stored in the data folder. The process is summarized in the following diagram:

Data Collection


πŸ—οΈ Architecture

The architecture of GuPT is shown in the following diagram:

Architecture


πŸ“ˆ Evaluation

The evaluation system has been restructured for better organization and functionality.

Evaluation Metrics

Run comprehensive evaluation of the RAG system with the new modular structure:

# Activate conda environment first
conda activate gupt

# Full evaluation (all test types)
python evaluation/run_evaluation.py

# Fast evaluation with smaller subset
python evaluation/run_evaluation.py --subset 50 --config fast

# Comprehensive evaluation with better models
python evaluation/run_evaluation.py --config comprehensive

# Run specific test type only
python evaluation/run_evaluation.py --test-type course_info --subset 20

# Available test types: course_info, prerequisites, learning_outcomes, assessment
# Available configs: default, fast, comprehensive

Output Files:

  • eval_[timestamp].md - Human-readable summary
  • evaluation_results_[timestamp].jsonl - Detailed per-test results
  • aggregated_metrics_[timestamp].json - Overall metrics

πŸ› οΈ Technologies Used


πŸŽ₯ Video Presentation

GuPT Video Presentation

About

GuPT is a RAG chatbot system providing accurate and quick answers about Gothenburg University's courses and programs to help students access academic information effortlessly.

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •