A multi-class text classification project fine-tuned on a custom dataset of user queries from a simulated customer support domain. This project demonstrates end-to-end fine-tuning of a transformer model (DistilBERT) for intent detection.
This project performs intent classification on a labeled dataset of customer queries. It involves:
- Data cleaning and preprocessing
- Baseline modeling with TF-IDF + Logistic Regression
- Fine-tuning a pre-trained transformer (DistilBERT)
- Model evaluation using accuracy, precision, recall, and F1-score
- Visualization of confusion matrix
- Total Samples: 600 (synthetically generated and paraphrased)
- Labels (5 classes):
product_qualitydelivery_issueprice_concernreturn_requestgeneral_inquiry
The dataset is stored in the data/ folder with separate train/test splits.
- Python 3.10+
- PyTorch
- Hugging Face Transformers
- Scikit-learn
- Pandas, NumPy
- Matplotlib, Seaborn
-
Clone this repo:
git clone https://github.com/rehan-shafi/intent-classifier.git cd intent-classifier -
Install dependencies: pip install -r requirements.txt
-
Train the model: python train_model.py
-
Outputs: Trained model saved in ./model_output Classification report and confusion matrix printed after training
📊 Sample Evaluation Output: Accuracy: 0.72 Macro F1 Score: 0.73
Confusion Matrix: (See plot in output section of notebook/script)
📁 Project Structure intent-classifier/ ├── data/ │ ├── train.csv │ └── test.csv ├── notebooks/ │ ├── tfidf_baseline.ipynb │ └── transformer_finetune.ipynb ├── scripts/ │ ├── model_utils.py │ └── train_model.py ├── model_output/ ├── README.md ├── requirements.txt └── LICENSE
📦 Future Improvements Hyperparameter tuning with Optuna Add model card and Hugging Face deployment Export model to ONNX or TorchScript for inference Streamlit or Gradio UI for demo
🧑💻 Author Rehan Shafi Generative AI Developer GitHub Profile (update with your GitHub)
📜 License MIT License – feel free to fork, modify, and use for educational purposes.