Skip to content
View KMoex-HZ's full-sized avatar
  • Open for Remote Opportunities
  • Remote / GMT+7
  • 00:03 (UTC -12:00)

Block or report KMoex-HZ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KMoex-HZ/README.md

Caelan Zhou

Data Engineer | Cloud Infrastructure & ETL Automation

Specialized in building robust, containerized data pipelines and data warehousing solutions. Focusing on scalability, clean architecture, and automated orchestration.


πŸ› οΈ Core Engineering Stack

Domain Technologies
Orchestration & Containerization Docker Airflow Dagster
Data Processing & Analytics Spark dbt DuckDB Python Soda
Storage, Warehouse & Cloud PostgreSQL Azure MinIO
Data Quality & Web Development Great Expectations Next.js Bash SSIS

πŸš€ Featured Engineering Projects

Dagster | dbt Core | DuckDB | Soda Core | GitHub Actions CI/CD Pipeline

  • Architecture: Architected a production-grade Modern Data Stack (MDS) simulation enforcing SCD Type 2 (History Tracking) and strict data contracts using DuckDB for high-performance columnar processing.
  • Key Tech: Implemented a fully automated CI/CD pipeline via GitHub Actions (with robust environment-agnostic path injection), integrated Soda Core for quality guardrails, and orchestrated modular transformations using Dagster Assets.

Apache Spark | Apache Airflow | MinIO (S3) | PostgreSQL | Great Expectations

  • Architecture: Engineered a scalable Lakehouse-style pipeline to process 2.9M+ raw records from NYC Open Data.
  • Key Tech: Implemented distributed data processing using a Spark Master-Worker cluster and integrated automated data quality guardrails with Great Expectations to ensure 99% data integrity before warehouse loading.

dbt Core | PostgreSQL | Docker | Dimensional Modeling

  • Architecture: Built an end-to-end ELT pipeline transforming raw CSV data into a business-ready Star Schema Data Warehouse.
  • Key Tech: Implemented dbt (data build tool) for modular SQL transformations, automated testing (schema & referral integrity), and data lineage documentation.

Docker | Apache Airflow | PostgreSQL | Python

  • Architecture: Designed a fault-tolerant ETL pipeline to ingest real-time financial data.
  • Key Tech: Implemented custom Airflow DAGs for hourly scheduling with automated retries and containerized the entire environment using Docker Compose for portability.

Azure VM | SSIS | SQL Server | Kimball Dimensional Modeling

  • Role: Principal Data Engineer & Team Lead. Led a team of 4 to build an end-to-end Data Warehouse for ITERA's Quality Assurance Institute.
  • Key Tech: Designed a Star Schema for Intellectual Property tracking and engineered complex SSIS packages for ETL orchestration on Azure cloud infrastructure.

Pinned Loading

  1. modern-data-platform-dagster modern-data-platform-dagster Public

    A production-grade Modern Data Stack (MDS) implementation featuring automated ELT, SCD Type 2 history tracking, and CI/CD quality guardrails using Dagster, dbt Core, DuckDB, and Soda.

    Python

  2. nyc-taxi-pipeline-spark-airflow nyc-taxi-pipeline-spark-airflow Public

    An automated end-to-end data pipeline using Apache Airflow, Spark, and MinIO for processing NYC Taxi datasets. Features containerized infrastructure (Docker), distributed transformations, and data …

    Python

  3. market-ingestion-pipeline market-ingestion-pipeline Public

    A production-ready ETL pipeline automating cryptocurrency market data ingestion using Apache Airflow, Docker, and PostgreSQL.

    Python

  4. Emotion_Multimodal_BDC Emotion_Multimodal_BDC Public

    Big Data Challenge (BDC) Satria Data 2025 – Emotion Classification

    Jupyter Notebook

  5. LPMPP-Data-Warehouse-Project LPMPP-Data-Warehouse-Project Public

    LPMPP Data Mart: Institutional Quality Assurance Analytics

    TSQL

  6. IDSC2026-Mathematics-for-Hope-Glaucoma IDSC2026-Mathematics-for-Hope-Glaucoma Public

    Quality-aware deep learning pipeline for glaucoma detection | IDSC 2026 | ROC-AUC 0.9801

    Python