A machine-learning-based system for automatic content moderation, detecting toxic, spam, and safe content.
The system combines rule-based filters with ML classifiers to simulate real-world moderation tools used by platforms like Facebook, Twitter, YouTube, and OpenAI.
π Open in Colab: Click Here
This project implements a Content Moderation System that automatically classifies and filters user-generated content using machine learning techniques.
- Text normalization and cleaning
- Feature extraction (length, word count, capitalization ratio, etc.)
- URL, mention, and hashtag detection
- Offensive keyword blacklist
- Pattern matching for spam/promotional content
- Excessive capitalization and character repetition detection
- Algorithms: Naive Bayes, Logistic Regression, SVM, Random Forest
- TF-IDF vectorization
- Multi-class classification: toxic, spam, safe
- Weighted scoring combining rule-based + ML predictions
- Configurable confidence thresholds
- Dynamic decision making based on risk scores
You can run and explore the project notebook here:
π Colab Link
The dataset used for training the model is available on Google Drive:
π Download Dataset
- Haneen Mohamed
- Mai Mohamed
- Farida Montser