Running sentiment_classifier Python Notebooks #8

splimon · 2026-01-30T02:07:57Z

splimon
Jan 30, 2026

Hi @carlosparadis ,

When running the Train.ipynb Python notebook, it expects me to have the following datasets:

crossplatform_sf_dataset_tokenized.csv: This is the main dataset used in this study.
so-dataset_tokenized.csv: This dataset originates from the research paper Sentiment Polarity Detection for Software Development.
gh-dataset_tokenized.csv: This dataset is derived from the research paper GitHub Golden Rule

Questions:

Where can I obtain the missing datasets?
Should I train the model on all three datasets, or would you like me to focus on a specific one?
Are the _tokenized.csv files the final datasets ready for training, or do I need to run tokenize_statistics.ipynb on raw data first to generate them?

Best,
Samantha

carlosparadis · 2026-01-30T02:15:39Z

carlosparadis
Jan 30, 2026
Maintainer

Ah, I was anticipating this question :)

I just sent you an invite for the G. Drive: https://drive.google.com/drive/u/0/folders/1x9OZl5HSftEaWB5dMwRn9uLhwAy7i9UA

Let me know if you have any questions.

First try to just run with the files that are already there and Haotian file. Once everything is in order, we can try training on the github data just to see what we would need to change in it (I presume at most column names).

For training, I recommend you use Google Colab Notebooks to process the models. I believe the models should fit there since they are not LLMs.

As for your third question, did you check the .Rmd sentiment notebook? That Notebook should do everything for you, you shouldn't need to run any .ipynb (that's why we wanted to consolidate everything in .Rmd! Too many moving parts otherwise).

Let me know.

1 reply

splimon Jan 30, 2026
Author

Hi @carlosparadis ,

Thank you for the files! I'll work on running those files and the Haotian file.

As for the sentiment_analysis.Rmd notebook, I tried to run it, but was facing some issues.

First, the master branch of Kaiaulu, which is the version I cloned, is missing these files:

R/sentiment.R
vignettes/sentiment_analysis.Rmd

I was able to find a version of these files in Kaiaulu PR #347 and added them to my local Kaiaulu folder along with the kaiaulu.yml, tools.yml, and config.R files from that PR. I've attached all the files I added from that PR here for reference: kaiaulu_PR#347.zip

However, when I try to run the notebook in RStudio:

The configuration structure seems different.
The kaiaulu.yml has:

filter:
  replies:
    filter_by_reply_author_substring:

But I'm getting this error:

authors <- get_filter_by_reply_author_substring(conf)

# Warning: Attribute 'filter_by_reply_author_substring' does not exist

I'm not sure if the config.R file from PR #347 has all the functions the notebook needs

The notebook calls:

model_save_path <- get_sentiment_model_folder_path(conf)
prediction_save_path <- get_sentiment_prediction_folder_path(conf)

But these functions don't seem to exist in config.R, even though the paths are defined in kaiaulu.yml under tool.sentiment.model and tool.sentiment.prediction.

Could you help me understand the correct setup?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running sentiment_classifier Python Notebooks #8

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running sentiment_classifier Python Notebooks #8

Uh oh!

splimon Jan 30, 2026

Replies: 1 comment · 1 reply

Uh oh!

carlosparadis Jan 30, 2026 Maintainer

Uh oh!

splimon Jan 30, 2026 Author

splimon
Jan 30, 2026

Replies: 1 comment 1 reply

carlosparadis
Jan 30, 2026
Maintainer

splimon Jan 30, 2026
Author