The aim of this project is to investigate, quantify and model the impact of Covid-19 on the expressed sentiment of British Columbians on Twitter, in relation to provincial politics.
The data analyzed in this project was collected from Twitter over a period over several months, from August 14th, 2020 to November 19th, 2020.
Using popular third party Twitter API tools, Tweepy and Twarc, over four hundred thousand tweet IDs in relation to #bcpoli were collected and later rehydrated for analysis. The Twitter data used in this project was collected in accordace with the Twitter developer terms and conditions.
If you have a Twitter developer account, this entire project is reproducible. In accordance with Twitter's data redistribution policy, only the tweet IDs have been published. Using these IDs and your own Twitter API keys, you can rehydrate the tweets via a third party tool, like Twarc.
data-science-final-project/
├── LICENSE
├── README.md
├── data
│ ├── bc_covid_data.sav
│ ├── bcpoli_tweet_id_400k.txt
│ └── bcpoli_vader_labelled_tweets.sav
├── deploy
│ ├── readme.txt
│ └── streamlit.zip
├── models
│ ├── LogReg_GridCV_3C_87p_40kfeats.sav
│ ├── LogReg_model_3C_86p__40kfeats.sav
│ └── NBMultinomial_model_3C_83p_40kfeats.sav
├── notebooks
│ ├── preliminary_analysis
│ │ ├── preliminary_classification_unlabelled_tweets.ipynb
│ │ ├── preliminary_twitter_data_exploration.ipynb
│ │ └── readme.txt
│ ├── bc_covid_data.ipynb
│ ├── covid_sentiment_impact.ipynb
│ ├── sentiment_scoring_400K_tweets.ipynb
│ ├── training_baseline_classifier.ipynb
│ └── twitter_data_exploration_400k_tweets.ipynb
└── scripts
├── extract_tweet_ids.R
└── functions.py
Data Processing
Sentiment Analysis
Topic Modelling
App Deployment
Tweet Scraping