A machine learning-based web application that classifies SMS messages as Spam or Ham (Not Spam) using NLP and classification algorithms.
The dataset is a highly imbalanced one with the percentage of HAM(non-spam)messages being 85%.
- β Predicts whether a given message is spam or not
- β Clean and intuitive UI
- β Trained on the popular SMS Spam Collection Dataset
- β Visualization of word frequencies and message length distributions
- Algorithm: Naive Bayes / Support Vector Machine (choose whichever you used)
- Libraries:
scikit-learn,pandas,numpy,matplotlib,seaborn - Vectorization:
CountVectorizer/TF-IDF
- Spam messages tend to be longer and include promotional words.
- Word clouds and bar plots were used to identify top keywords in spam vs ham.
- Clone the repo
git clone https://github.com/deBurglar/SMS-Spam-Predictor.git cd SMS-Spam-Predictor

