A Multi-Model Comparison Framework for Reddit Comment Moderation
This project classifies whether Reddit comments violate subreddit rules. It compares three different approaches to solve this text classification problem:
- Encoder - Fine-tuned DeBERTa transformer
- Decoder - Fine-tuned Qwen3 large language model
- API - Mistral models with few-shot prompting
Reddit moderators need to determine if user comments violate subreddit rules. Given:
- A comment (e.g., "Check out my website for free stuff!")
- A rule (e.g., "No Advertising: Spam and promotional content are not allowed")
The model outputs: Does this comment violate this rule? (Yes/No with confidence score)
Each approach has trade-offs:
| Approach | Training Required | Speed | Cost |
|---|---|---|---|
| Encoder (DeBERTa) | Yes (fine-tuning) | Fastest | Free (local) |
| Decoder (Qwen3) | Yes (LoRA fine-tuning) | Hardware | Free (local GPU) |
| API (Mistral) | No (few-shot) | Network/Provider | Paid API |
Input: "No spam rule [SEP] Check out my website!"
↓
[DeBERTa Transformer - 300M parameters]
↓
Output: 0.87 probability → VIOLATION
The encoder learns to map text to a probability score through supervised training.
System: "Does the comment violate the rule? Answer Yes or No."
User: "Comment: Check out my website!\n\nRule: No spam"
↓
[Qwen3-4B Fine-tuned LLM]
↓
Output: "Yes" (with 0.92 probability from logits)
The decoder generates Yes/No responses after being fine-tuned on moderation examples.
System: "You are a Reddit moderator for r/AskReddit..."
Examples: [6 few-shot examples of violations/non-violations]
User: "Check out my website for free stuff!"
↓
[Mistral API - cloud inference]
↓
Output: {"violates": true, "confidence": 0.9, "reason": "Promotional spam"}
The API uses in-context learning from examples without any training.
BusinessAnalytics/
├── prep.ipynb # Data preparation (run first!)
├── encoder/
│ └── deberta.ipynb # Fine-tune DeBERT
├── decoder/
│ ├── inference.ipynb # Run inference with fine-tuned Qwen3
│ └── qwens.ipynb # Training script for Qwen3 models
├── api/
│ └── mistral.ipynb # Test Mistral models via API
├── data/
├── requirements.txt # Python dependencies
└── .env # API keys (create this file)
All models were evaluated on the same 40-sample test set with balanced classes (20 violations, 20 non-violations).
| Model | AUC | Accuracy | F1 | Precision | Recall |
|---|---|---|---|---|---|
| mistral-small-2506 | 0.7400 | 0.5500 | 0.6897 | 0.5263 | 1.0000 |
| ministral-14b-2512 | 0.7063 | 0.5750 | 0.7018 | 0.5405 | 1.0000 |
| mistral-large-2512 | 0.7025 | 0.5500 | 0.6897 | 0.5263 | 1.0000 |
| ministral-3b-2512 | 0.7062 | 0.5000 | 0.6667 | 0.5000 | 1.0000 |
Key Observations:
- All models achieve 100% recall - They catch all violations (but also have false positives)
- Smaller isn't always worse -
mistral-smalloutperformsmistral-largeon AUC - Precision is the challenge - Models tend to over-predict violations
- Best overall:
ministral-14b-2512has highest accuracy and F1
| Approach | Pros | Cons |
|---|---|---|
| Encoder | Fast, free, deterministic | Requires training data, less interpretable |
| Decoder | Good balance, local inference | Requires GPU, complex setup |
| API | No training, explains reasoning | Costs money, slower, rate limits |
A model outputs probabilities (0.0 to 1.0). The threshold determines where we draw the line between "violation" and "OK". For example:
- Threshold = 0.5: Standard (≥50% → violation)
- Threshold = 0.3: Aggressive (catches more, more false positives)
- Threshold = 0.7: Conservative (fewer false positives, might miss some)