Skip to content

sheddiboo/Log-Classification-System

Repository files navigation

Hybrid Log Classification System

This project implements a robust, three-tier hybrid log classification framework designed to process system logs with high accuracy and efficiency. By combining deterministic rules, local machine learning, and advanced Large Language Models (LLMs), the system ensures that every log is categorized correctly, regardless of its complexity.

🚀 The Three-Tier Architecture

To balance speed, cost, and reasoning capability, the system processes logs through the following hierarchy:

  1. Tier 1: Regular Expressions (Regex)
  • Purpose: Instant classification for predictable, high-frequency patterns.
  • Logic: If a log matches a predefined regex rule, it is labeled immediately, bypassing heavier models to save compute resources.
  1. Tier 2: BERT / Sentence Transformers
  • Purpose: Local machine learning for complex but standard log patterns.
  • Logic: Utilizing Sentence-Transformers to generate embeddings and a Logistic Regression classifier (built with scikit-learn 1.8.0) to handle logs that pass the Regex tier. This provides high-speed inference without external API costs.
  1. Tier 3: Llama 3.3 70B (LLM)
  • Purpose: High-reasoning fallback for "Unclassified" or legacy system logs.
  • Logic: Any log not confidently caught by Tier 1 or 2 (or specifically designated from LegacyCRM) is sent to the Llama-3.3-70b-versatile model via the Groq API. This ensures even the most ambiguous logs are labeled with human-like reasoning.

📂 Project Structure

log_classification/
├── server.py              # FastAPI backend entry point
├── classify.py            # Main coordination logic (the "Master Brain")
├── bert_processor.py      # Tier 2: Sentence Transformer logic
├── llm_processor.py       # Tier 3: Groq / Llama 3.3 API logic
├── regex_processor.py     # Tier 1: Pattern-based rules
├── log_classifier.joblib  # Trained local ML model
├── requirements.txt       # Project dependencies (locked to dev_env)
├── .env                   # Private API keys (Excluded from Git)
├── resources/             # Directory for output CSV files
└── test.csv               # Sample input data for testing


🛠️ Installation & Setup

1. Environment Setup

Ensure you are using Python with the necessary dependencies installed. It is recommended to use an environment manager like Anaconda.

pip install -r requirements.txt

2. Configure API Keys

Create a .env file in the root directory and add your Groq API key:

GROQ_API_KEY=your_actual_key_here

3. Start the API Server

Launch the backend using the Python interpreter. Ensure your virtual environment (e.g., dev_env) is active before running:

# run the server script directly
python server.py

Note: The script is configured to initialize the Uvicorn server automatically on 127.0.0.1:8000.


💻 API Usage & Testing

Once the server is running, you can interact with the classification pipeline through the following interfaces:

How to Classify a File

  1. Navigate to the Swagger UI.
  2. Expand the POST /classify/ endpoint.
  3. Click "Try it out" and upload your test.csv.
  4. The system will process the logs through the Regex ➔ BERT ➔ LLM pipeline and return a downloadable CSV with a target_label column.

Classifying Logs

  • Endpoint: POST /classify/
  • Payload: A .csv file containing two required columns: source and log_message.
  • Output: A processed .csv file containing an additional target_label column.

🎓 Credits & Attribution

This project was inspired by and built upon the foundational concepts from the Codebasics Hybrid Log Classification project.

Key Modifications & Enhancements:

  • Upgraded LLM Tier: Replaced default LLM logic with Llama 3.3 70B via the Groq API for state-of-the-art reasoning.
  • Modernized Stack: Updated the pipeline to be compatible with scikit-learn 1.8.0 and FastAPI 0.125.0.
  • Deployment Ready: Integrated a full FastAPI backend with dedicated endpoint handling for CSV batch processing.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors