Business Analytics

A Multi-Model Comparison Framework for Reddit Comment Moderation

This project classifies whether Reddit comments violate subreddit rules. It compares three different approaches to solve this text classification problem:

Encoder - Fine-tuned DeBERTa transformer
Decoder - Fine-tuned Qwen3 large language model
API - Mistral models with few-shot prompting

Project Overview

The Problem

Reddit moderators need to determine if user comments violate subreddit rules. Given:

A comment (e.g., "Check out my website for free stuff!")
A rule (e.g., "No Advertising: Spam and promotional content are not allowed")

The model outputs: Does this comment violate this rule? (Yes/No with confidence score)

Why Three Approaches?

Each approach has trade-offs:

Approach	Training Required	Speed	Cost
Encoder (DeBERTa)	Yes (fine-tuning)	Fastest	Free (local)
Decoder (Qwen3)	Yes (LoRA fine-tuning)	Hardware	Free (local GPU)
API (Mistral)	No (few-shot)	Network/Provider	Paid API

How It Works

1. Encoder Approach (DeBERTa)

Input:  "No spam rule [SEP] Check out my website!"
         ↓
    [DeBERTa Transformer - 300M parameters]
         ↓
Output: 0.87 probability → VIOLATION

The encoder learns to map text to a probability score through supervised training.

2. Decoder Approach (Qwen3)

System: "Does the comment violate the rule? Answer Yes or No."
User:   "Comment: Check out my website!\n\nRule: No spam"
         ↓
    [Qwen3-4B Fine-tuned LLM]
         ↓
Output: "Yes" (with 0.92 probability from logits)

The decoder generates Yes/No responses after being fine-tuned on moderation examples.

3. API Approach (Mistral)

System: "You are a Reddit moderator for r/AskReddit..."
Examples: [6 few-shot examples of violations/non-violations]
User:   "Check out my website for free stuff!"
         ↓
    [Mistral API - cloud inference]
         ↓
Output: {"violates": true, "confidence": 0.9, "reason": "Promotional spam"}

The API uses in-context learning from examples without any training.

Project Structure

BusinessAnalytics/
├── prep.ipynb                 # Data preparation (run first!)
├── encoder/
│   └── deberta.ipynb          # Fine-tune DeBERT
├── decoder/
│   ├── inference.ipynb        # Run inference with fine-tuned Qwen3
│   └── qwens.ipynb            # Training script for Qwen3 models
├── api/
│   └── mistral.ipynb          # Test Mistral models via API
├── data/
├── requirements.txt           # Python dependencies
└── .env                       # API keys (create this file)

Model Performance Comparison

All models were evaluated on the same 40-sample test set with balanced classes (20 violations, 20 non-violations).

Results

Model	AUC	Accuracy	F1	Precision	Recall
mistral-small-2506	0.7400	0.5500	0.6897	0.5263	1.0000
ministral-14b-2512	0.7063	0.5750	0.7018	0.5405	1.0000
mistral-large-2512	0.7025	0.5500	0.6897	0.5263	1.0000
ministral-3b-2512	0.7062	0.5000	0.6667	0.5000	1.0000

Key Observations:

All models achieve 100% recall - They catch all violations (but also have false positives)
Smaller isn't always worse - mistral-small outperforms mistral-large on AUC
Precision is the challenge - Models tend to over-predict violations
Best overall: ministral-14b-2512 has highest accuracy and F1

Trade-offs Summary

Approach	Pros	Cons
Encoder	Fast, free, deterministic	Requires training data, less interpretable
Decoder	Good balance, local inference	Requires GPU, complex setup
API	No training, explains reasoning	Costs money, slower, rate limits

Threshold Optimization

A model outputs probabilities (0.0 to 1.0). The threshold determines where we draw the line between "violation" and "OK". For example:

Threshold = 0.5: Standard (≥50% → violation)
Threshold = 0.3: Aggressive (catches more, more false positives)
Threshold = 0.7: Conservative (fewer false positives, might miss some)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Business Analytics

Table of Contents

Project Overview

The Problem

Why Three Approaches?

How It Works

1. Encoder Approach (DeBERTa)

2. Decoder Approach (Qwen3)

3. API Approach (Mistral)

Project Structure

Model Performance Comparison

Results

Trade-offs Summary

Threshold Optimization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api		api
decoder		decoder
encoder		encoder
.gitignore		.gitignore
README.md		README.md
prep.ipynb		prep.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Business Analytics

Table of Contents

Project Overview

The Problem

Why Three Approaches?

How It Works

1. Encoder Approach (DeBERTa)

2. Decoder Approach (Qwen3)

3. API Approach (Mistral)

Project Structure

Model Performance Comparison

Results

Trade-offs Summary

Threshold Optimization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages