A data engineering project simulating an e-commerce analytics platform with end-to-end integration of OLTP, NoSQL, data warehousing, ETL pipelines, big data analytics, and BI dashboards.
This project demonstrates the design and implementation of a modern data platform for an e-commerce company whose online presence is driven entirely by:
- Sales transactional data stored in MySQL
- Product catalog data stored in MongoDB
To enable analytics and business intelligence:
- Data is periodically extracted from these databases into a staging data warehouse
- ETL pipelines orchestrated by Apache Airflow extract, transform and load the data
- Apache Spark is used for big data analytics and sales forecasting
- Tableau dashboards provide business insights for BI teams
Design and implement a robust data platform to integrate and analyze e-commerce data from multiple sources for operational reporting, business intelligence, and machine learning use cases.
- Design data repositories using MySQL (OLTP) and MongoDB (NoSQL) for transactional and catalog data
- Build a PostgreSQL data warehouse, create fact and dimension tables, and perform cube and rollup operations
- Develop Tableau dashboards to visualize key business metrics
- Create ETL pipelines with Apache Airflow to extract, transform, and load data into the warehouse
- Perform big data analytics using Apache Spark, deploying a machine learning model for sales forecasting
- Design and populate the OLTP schema for sales data
- Automate periodic data exports
- Load e-commerce catalog data
- Query and manage product information in MongoDB
- Design and implement the data warehouse schema
- Create fact and dimension tables for analytical queries
- Load data into the data warehouse
- Build cubes and rollups
- Design dashboards to analyze sales performance across time, categories, and geographies
- Extract e-commerce web server log
- Transform data to exclude specific IP Address
- Load transformed data into tar file
- Automate incremental data loads using Airflow DAGs
- Analyze e-commerce search terms using Spark
- Deploy pretrained sales forecasting models with SparkML
- Predict future sales trends for business planning
| Purpose | Tool |
|---|---|
| OLTP database | MySQL |
| NoSQL database | MongoDB |
| Data warehouse | PostgreSQL |
| Data pipelines | Apache Airflow |
| Big data analytics | Apache Spark |
| Business intelligence | Tableau |
The datasets used in this project are synthetic and were programmatically generated as part of the IBM Data Engineering Capstone Project within the IBM Data Engineering Professional Certificate on Coursera.
.
├── 01_oltp/ # MySQL OLTP setup
├── 02_nosql/ # MongoDB NoSQL setup
├── 03_dwh/ # PostgreSQL Data Warehouse
├── 04_analytics/ # Tableau Dashboards
├── 05_etl/ # Apache Airflow ETL pipelines
├── 06_spark/ # Apache Spark big data analytics
└── README.md # Project README file






