Skip to content

EDA for the emotion_dataset_v1#367

Open
davchuks wants to merge 5 commits intoGopher-Industries:mainfrom
davchuks:main
Open

EDA for the emotion_dataset_v1#367
davchuks wants to merge 5 commits intoGopher-Industries:mainfrom
davchuks:main

Conversation

@davchuks
Copy link
Collaborator

Description

[This PR introduces an exploratory data analysis (EDA) notebook for the emotion tagging project and sets up the foundation for model training.]

Key Changes

  • Data ingest & schema validation
  • Loads nurse_emotion.csv, inspects schema with df.info() and head()
  • Data cleaning pipeline
  • Lower-casing, punctuation removal, stop-word filtering, lemmatization
  • Label profiling
  • Class balance of emotionpolarity and emotionTags
  • Plots saved: emotion_polarity_distribution.png, emotion_tags_distribution.png
  • Text statistics & visualizations
  • Token counts, n-grams, word clouds
  • Boxplot: note length vs emotion tags → emotion_vs_note_length.png
  • Temporal profiling
  • Derived time-of-day buckets (Morning/Afternoon/Evening/Night)
  • Distribution plot saved as emotion_by_time_of_day.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant