Replies: 1 comment 1 reply
-
|
Ah, I was anticipating this question :) I just sent you an invite for the G. Drive: https://drive.google.com/drive/u/0/folders/1x9OZl5HSftEaWB5dMwRn9uLhwAy7i9UA Let me know if you have any questions. First try to just run with the files that are already there and Haotian file. Once everything is in order, we can try training on the github data just to see what we would need to change in it (I presume at most column names). For training, I recommend you use Google Colab Notebooks to process the models. I believe the models should fit there since they are not LLMs. As for your third question, did you check the .Rmd sentiment notebook? That Notebook should do everything for you, you shouldn't need to run any .ipynb (that's why we wanted to consolidate everything in .Rmd! Too many moving parts otherwise). Let me know. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @carlosparadis ,
When running the
Train.ipynbPython notebook, it expects me to have the following datasets:crossplatform_sf_dataset_tokenized.csv: This is the main dataset used in this study.so-dataset_tokenized.csv: This dataset originates from the research paper Sentiment Polarity Detection for Software Development.gh-dataset_tokenized.csv: This dataset is derived from the research paper GitHub Golden RuleQuestions:
_tokenized.csvfiles the final datasets ready for training, or do I need to runtokenize_statistics.ipynbon raw data first to generate them?Best,
Samantha
Beta Was this translation helpful? Give feedback.
All reactions