Skip to content

Basic concepts of the ETL (Extract-Transform-Load) process will be applied. We will extract the required information using Web Scraping and API techniques, developing a functional ETL Pipeline for the acquisition and processing stages of data ingested in multiple formats from a public domain website.

Notifications You must be signed in to change notification settings

ignaciopieroni/data-engineering-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Project Using Python

In this project, we'll work with real-world data using Python to obtain it
directly from a webpage using Web Scraping techniques, transforming the data
to meet the given requirements, and saving it as a local file as a table in a database.
We will also run basic queries on that database using Python.

Basic concepts of the ETL (extract-transform-load) process will be applied, extracting
the required information using Web Scraping and APIs, developing a functional
ETL Pipeline for the stages of data acquisition and processing of ingested data in multiple formats
from a piblic domain.

Finally, we will create modules, run unit tests, Package Applications, and perform a Static Code
Analysis
also using Python.

Project Scenario

Our job is to access and process data as per requirements.

We are asked to compile the list of the top 10 largest banks in the world ranked by market capitalization in billion USD. Further, we need to transform the data and store it in USD, GBP, EUR, and INR per the exchange rate information made available to me as a CSV file. We'll save the processed information as a table, locally in a CSV format and as a database table. Provide the data ready for queries for extracing the list and note the market capitalization value in their own currency.

About

Basic concepts of the ETL (Extract-Transform-Load) process will be applied. We will extract the required information using Web Scraping and API techniques, developing a functional ETL Pipeline for the acquisition and processing stages of data ingested in multiple formats from a public domain website.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages