Skip to content

rallm/sentiment-analysis-roberta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎭 Emotion Classification with RoBERTa & Gradio

Python Hugging Face Gradio License

This project is a complete Sentiment Analysis pipeline that fine-tunes a RoBERTa model to classify text into six different emotions. The project handles dataset class imbalance using weighted loss and includes an interactive web demo built with Gradio and deployed to Hugging Face Spaces.

πŸ“Š Dataset

We use the dair-ai/emotion dataset. It contains English Twitter messages labeled with six basic emotions:

Label ID Emotion
0 Sadness 😒
1 Joy πŸ˜‚
2 Love πŸ₯°
3 Anger 😑
4 Fear 😱
5 Surprise 😲

πŸ› οΈ Technical Approach

This project goes beyond standard fine-tuning by addressing class imbalance in the training data:

  1. Data Preprocessing: Tokenization using RobertaTokenizer with truncation to a max length of 128.
  2. Class Weights: We compute class weights using sklearn.utils.class_weight to penalize the model more for misclassifying minority classes (like Surprise).
  3. Custom Trainer: A custom WeightedTrainer (subclassing Hugging Face's Trainer) is implemented to override the compute_loss method, injecting the calculated class weights into the CrossEntropyLoss.
  4. Model: Fine-tuning roberta-base for sequence classification.

About

My first experience using hugging face space | Emotion recognition model

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published