Skip to content

elizzhou/weather-stocks-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

405 Final Project: Cloudy with a Chance of Trades

This project aims to identify the relationship between weather—specifically, precipitation and snow levels—on Nasdaq trading volumes and closing prices.

tl;dr:

To run the pipeline, git clone this repo to your local machine and download the 3 data files here. Use bash run_script.sh on your local terminal to run the entire pipeline in one command.

The Tableau visualization can be accessed through this link.


Full README:

Pre-Processing

We pre-processed our data before loading it into our pipeline. We filtered our data to only include data points from the year 2016 using the spark_preprocessing.ipynb notebook, and yielded two datasets in parquet format (weather.snappy.parquet and nasdaq.snappy.parquet). A third dataset was added later.

Data

You can skip the pre-processing step and directly download the data files through this link.
The original unprocessed data is available further down.

Step 1: Data Manipulation

The pyspark_script.py file is used to run the data manipulation code, which includes some column removals, transformations, and joins.

Step 2: Aggregations

The queries_script.py file is used to run the data aggregation code, which includes DuckDB queries that output 6 CSV files to be used for our Tableau visualizations.

Step 3: The One Line to Run Them All

Once all the required data files are downloaded and available, you can skip manually doing Steps 1 & 2 by simply running the run_script.sh file by using the following command on your local command line: bash run_script.sh

Step 4: Visualization

Step 3 will give you 6 CSV files as outputs. These files were visualized to draw our conclusions.

We used Tableau Public for our visualizations. Our dashboard can be accessed through this link.


Original Data

You can find the original sources to our data at the links below.

Global Daily Weather Data
Same columns as in weather.snappy.parquet.
GHCN_DIN: Global Historical Climatology Network Daily Identification Number
DATE (year-month-day)
PRCP: precipitation (tenths of mm)
SNOW: snowfall (mm)
TMAX: daily maximum temperature (Cº)
TMIN: daily minimum temperature (Cº)
NAME: weather station name
ELEVATION: elevation (meters)
COUNTRY_CODE: two-letter country code
COORD: latitude and longitude of the station, in decimal degrees

Nasdaq data
To download this, you will need to sign up for an API key.
Same columns as in nasdaq.snappy.parquet.
ticker: stock ticker
date (year-month-day)
open: first price that the stock was traded on that day
high: highest price that the stock reached on that day
low: lowest price that the stock reached on that day
close: last price that the stock was traded on that day
volume: total number of shares traded on that day
ex-dividend: cash dividend per share paid by the company on that day, adjusted for stock splits
split_ratio: ratio of a stock split that occurred on that day
adj_open: open, adjusted for stock splits and dividends
adj_high: high, adjusted for stock splits and dividends
adj_low: low, adjusted for stock splits and dividends
adj_close: close, adjusted for stock splits and dividends
adj_volume: volume, adjusted for stock splits

Nasdaq Industries Data

Same columns as in nasdaq_industries.csv.
Symbol: stock ticker
Name: company name
Last Sale: most recent price at which the company's stock was traded
Net Change: difference between the last sale price and the previous day's closing price​
% Change: percentage change in the stock's price compared to the previous day's closing price
Market Cap: total market value of a company's outstanding shares
Country: country where the company is headquartered.
IPO Year: year the company first offered its shares to the public through an Initial Public Offering
Sector: company sector
Industry: company industry


Further Work / To Do

  • Testing GCP integration

Credits

This project was made by:
Alyssa Fontaine
Eliz Zhou
Jane Lee
Joshua Bastin
Megan Bennett
Yash Laddha

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published