This project is a Machine Learning-based Email Spam Classifier that leverages Natural Language Processing (NLP) techniques to detect and filter out spam emails from legitimate ones. Using a combination of feature extraction and classification algorithms, this model identifies spam emails with high accuracy, providing a robust solution for email filtering.
π Features
Natural Language Processing (NLP) for text preprocessing
TF-IDF Vectorizer for feature extraction
Naive Bayes & Support Vector Machine (SVM) for classification
Train/Test Split for model evaluation
Accuracy, Precision, Recall, and F1-Score for performance metrics
π οΈ Tech Stack
Python π
Scikit-learn βοΈ
Pandas πΌ
NumPy π’
Matplotlib π
π How It Works
Data Preprocessing: Email text is cleaned and converted into features using TF-IDF.
Model Training: A classification model is trained using labeled email data (spam/ham).
Prediction: The model predicts whether new emails are spam or legitimate.
Evaluation: The model is evaluated using various performance metrics to ensure reliability.
π Results
Achieved 97% accuracy with balanced precision and recall
Handles large datasets efficiently
Easily customizable for other text classification tasks
π Repository Structure
data/: Contains the dataset used for training and testing
notebooks/: Jupyter notebooks for data exploration and model training
src/: Source code for preprocessing, model building, and evaluation
README.md: Project documentation (this file!)