AWS_S3_redshift

This project creates a data warehouse hosted on AWS, running an ETL pipeline for a database hosted on Redshift. The ETL pipeline extracts data from AWS S3, stages them in Redshift, and transforms data into a set of dimensional tables analytical purposes.

Ropository contains

sql_queries.py is a collection of sql queries. It includes statements on creating, staging and dropping tables.
create_table.py is a python script connecting to redshift and call the sql queries from sql_queries.py that drop old tables and create the frame of new tables
etl.py is an etl pipeline connecting to redshift and call the sql queries from sql_queries.py that load data from the staging tables and insert data to the tables in the STAR schema
IaC_create_IAM_Cluster is a notebook that create an IAM role and a redshift cluster using Infrastructure as Code (IaC)
test_run is a python script made to inspect the various tables

Note that the files are refering to a private dwh.cfg (not on github), which direct them to the AWS S3 data and AWS redshift. The XX_dwh.cfg is almost identical to the original dwh config file, except sensitive information is sensored...

Cheers!

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
IaC_Creat_IAM_Cluster.ipynb		IaC_Creat_IAM_Cluster.ipynb
README.md		README.md
XX_dwh.cfg		XX_dwh.cfg
create_tables.py		create_tables.py
etl.py		etl.py
sql_queries.py		sql_queries.py
test_run.py		test_run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS_S3_redshift

Ropository contains

About

Uh oh!

Releases

Packages

Languages

MartinBA741/AWS_S3_redshift

Folders and files

Latest commit

History

Repository files navigation

AWS_S3_redshift

Ropository contains

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages