This repository contains an end-to-end implementation of sentiment analysis using a pre-trained BERT model fine-tuned with the Hugging Face Transformers library. The project demonstrates a complete NLP workflow, starting from exploratory data analysis to model training, evaluation, and saving the trained model for reuse.
The goal of this project is to build a robust sentiment classification model using BERT. The notebook follows best practices for modern NLP pipelines, including stratified data splitting, Hugging Face Dataset integration, metric-based evaluation, and efficient model fine-tuning.
- The dataset is sourced from GitHub
- It contains labeled text data for sentiment classification
- The data is split into train, validation, and test sets using stratified sampling to preserve class distribution
The project follows the steps below:
-
Library Setup
- PyTorch
- Hugging Face Transformers & Datasets
- Scikit-learn, evaluate
- NumPy, Pandas, matplotlib, seaborn
-
Exploratory Data Analysis (EDA)
- Basic inspection of text samples
- Label distribution analysis
-
Stratified Train–Validation–Test Split
- Ensures balanced class representation across splits
-
Hugging Face Dataset Conversion
- Conversion from Pandas DataFrame to
DatasetandDatasetDict
- Conversion from Pandas DataFrame to
-
Label Encoding
- Creation of
label2idandid2labelmappings for model compatibility
- Creation of
-
Model Selection
- Pre-trained BERT model for sequence classification
- Loaded using Hugging Face Transformers
-
Tokenization
- Tokenization using BERT tokenizer
- Truncation and padding applied
- Removal of unnecessary columns to reduce memory usage
-
Training Configuration
- TrainingArguments configured for:
- Evaluation during training
- Logging
- Model checkpointing
- TrainingArguments configured for:
-
Evaluation Metrics
- Accuracy
- F1-score
- Precision
-
Model Training
- Fine-tuning performed using Hugging Face
TrainerAPI
- Fine-tuning performed using Hugging Face
-
Prediction and Evaluation
- Model evaluated on the test dataset
- Metrics reported using standard NLP evaluation practices
-
Model Saving
- Fine-tuned model and tokenizer saved for future inference or deployment
The fine-tuned BERT model achieved the following performance on the test set:
- Accuracy: 93.18%
- F1 Score: 93.19%
- Test Loss: 0.1966
- Evaluation Runtime: 15.95 seconds
- Samples per Second: 200.59
These results demonstrate strong generalization and effective fine-tuning of the BERT model for sentiment classification.
- Python
- PyTorch
- Hugging Face Transformers
- Hugging Face Datasets
- Scikit-learn
- Google Colab (GPU)