BERT Sentiment Analysis using Hugging Face

Open PDF if the ipynb file doesn't show up. The ipynb works when you download and open locally.

BERT Sentiment Analysis using Hugging Face

This repository contains an end-to-end implementation of sentiment analysis using a pre-trained BERT model fine-tuned with the Hugging Face Transformers library. The project demonstrates a complete NLP workflow, starting from exploratory data analysis to model training, evaluation, and saving the trained model for reuse.

Project Overview

The goal of this project is to build a robust sentiment classification model using BERT. The notebook follows best practices for modern NLP pipelines, including stratified data splitting, Hugging Face Dataset integration, metric-based evaluation, and efficient model fine-tuning.

Dataset

The dataset is sourced from GitHub
It contains labeled text data for sentiment classification
The data is split into train, validation, and test sets using stratified sampling to preserve class distribution

Workflow

The project follows the steps below:

Library Setup
- PyTorch
- Hugging Face Transformers & Datasets
- Scikit-learn, evaluate
- NumPy, Pandas, matplotlib, seaborn
Exploratory Data Analysis (EDA)
- Basic inspection of text samples
- Label distribution analysis
Stratified Train–Validation–Test Split
- Ensures balanced class representation across splits
Hugging Face Dataset Conversion
- Conversion from Pandas DataFrame to Dataset and DatasetDict
Label Encoding
- Creation of label2id and id2label mappings for model compatibility
Model Selection
- Pre-trained BERT model for sequence classification
- Loaded using Hugging Face Transformers
Tokenization
- Tokenization using BERT tokenizer
- Truncation and padding applied
- Removal of unnecessary columns to reduce memory usage
Training Configuration
- TrainingArguments configured for:
  - Evaluation during training
  - Logging
  - Model checkpointing
Evaluation Metrics
- Accuracy
- F1-score
- Precision
Model Training
- Fine-tuning performed using Hugging Face Trainer API
Prediction and Evaluation
- Model evaluated on the test dataset
- Metrics reported using standard NLP evaluation practices
Model Saving
- Fine-tuned model and tokenizer saved for future inference or deployment

Results

The fine-tuned BERT model achieved the following performance on the test set:

Accuracy: 93.18%
F1 Score: 93.19%
Test Loss: 0.1966
Evaluation Runtime: 15.95 seconds
Samples per Second: 200.59

These results demonstrate strong generalization and effective fine-tuning of the BERT model for sentiment classification.

Technologies Used

Python
PyTorch
Hugging Face Transformers
Hugging Face Datasets
Scikit-learn
Google Colab (GPU)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
BERT-Sentiment_Classification.ipynb - Colab.pdf		BERT-Sentiment_Classification.ipynb - Colab.pdf
BERT_Sentiment_Classification.ipynb		BERT_Sentiment_Classification.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open PDF if the ipynb file doesn't show up. The ipynb works when you download and open locally.

BERT Sentiment Analysis using Hugging Face

Project Overview

Dataset

Workflow

Results

Technologies Used

About

Uh oh!

Releases

Packages

Languages

Pavan-220405/bert-finetuning-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Open PDF if the ipynb file doesn't show up. The ipynb works when you download and open locally.

BERT Sentiment Analysis using Hugging Face

Project Overview

Dataset

Workflow

Results

Technologies Used

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages