Skip to content
This repository was archived by the owner on Jan 12, 2026. It is now read-only.

Hansehart/BusinessAnalytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Business Analytics

A Multi-Model Comparison Framework for Reddit Comment Moderation

This project classifies whether Reddit comments violate subreddit rules. It compares three different approaches to solve this text classification problem:

  1. Encoder - Fine-tuned DeBERTa transformer
  2. Decoder - Fine-tuned Qwen3 large language model
  3. API - Mistral models with few-shot prompting

Table of Contents


Project Overview

The Problem

Reddit moderators need to determine if user comments violate subreddit rules. Given:

  • A comment (e.g., "Check out my website for free stuff!")
  • A rule (e.g., "No Advertising: Spam and promotional content are not allowed")

The model outputs: Does this comment violate this rule? (Yes/No with confidence score)

Why Three Approaches?

Each approach has trade-offs:

Approach Training Required Speed Cost
Encoder (DeBERTa) Yes (fine-tuning) Fastest Free (local)
Decoder (Qwen3) Yes (LoRA fine-tuning) Hardware Free (local GPU)
API (Mistral) No (few-shot) Network/Provider Paid API

How It Works

1. Encoder Approach (DeBERTa)

Input:  "No spam rule [SEP] Check out my website!"
         ↓
    [DeBERTa Transformer - 300M parameters]
         ↓
Output: 0.87 probability → VIOLATION

The encoder learns to map text to a probability score through supervised training.

2. Decoder Approach (Qwen3)

System: "Does the comment violate the rule? Answer Yes or No."
User:   "Comment: Check out my website!\n\nRule: No spam"
         ↓
    [Qwen3-4B Fine-tuned LLM]
         ↓
Output: "Yes" (with 0.92 probability from logits)

The decoder generates Yes/No responses after being fine-tuned on moderation examples.

3. API Approach (Mistral)

System: "You are a Reddit moderator for r/AskReddit..."
Examples: [6 few-shot examples of violations/non-violations]
User:   "Check out my website for free stuff!"
         ↓
    [Mistral API - cloud inference]
         ↓
Output: {"violates": true, "confidence": 0.9, "reason": "Promotional spam"}

The API uses in-context learning from examples without any training.


Project Structure

BusinessAnalytics/
├── prep.ipynb                 # Data preparation (run first!)
├── encoder/
│   └── deberta.ipynb          # Fine-tune DeBERT
├── decoder/
│   ├── inference.ipynb        # Run inference with fine-tuned Qwen3
│   └── qwens.ipynb            # Training script for Qwen3 models
├── api/
│   └── mistral.ipynb          # Test Mistral models via API
├── data/
├── requirements.txt           # Python dependencies
└── .env                       # API keys (create this file)

Model Performance Comparison

All models were evaluated on the same 40-sample test set with balanced classes (20 violations, 20 non-violations).

Results

Model AUC Accuracy F1 Precision Recall
mistral-small-2506 0.7400 0.5500 0.6897 0.5263 1.0000
ministral-14b-2512 0.7063 0.5750 0.7018 0.5405 1.0000
mistral-large-2512 0.7025 0.5500 0.6897 0.5263 1.0000
ministral-3b-2512 0.7062 0.5000 0.6667 0.5000 1.0000

Key Observations:

  1. All models achieve 100% recall - They catch all violations (but also have false positives)
  2. Smaller isn't always worse - mistral-small outperforms mistral-large on AUC
  3. Precision is the challenge - Models tend to over-predict violations
  4. Best overall: ministral-14b-2512 has highest accuracy and F1

Trade-offs Summary

Approach Pros Cons
Encoder Fast, free, deterministic Requires training data, less interpretable
Decoder Good balance, local inference Requires GPU, complex setup
API No training, explains reasoning Costs money, slower, rate limits

Threshold Optimization

A model outputs probabilities (0.0 to 1.0). The threshold determines where we draw the line between "violation" and "OK". For example:

  • Threshold = 0.5: Standard (≥50% → violation)
  • Threshold = 0.3: Aggressive (catches more, more false positives)
  • Threshold = 0.7: Conservative (fewer false positives, might miss some)

About

Business Analytics Project in WS2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors