Skip to content

KavinKishore1111/event-driven-saas-analytics-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 Event-Driven SaaS Analytics Platform

Python PostgreSQL dbt Streamlit Architecture

A production-style Data Engineering project that simulates a real subscription-based SaaS product and builds a complete modern analytics pipeline — from large-scale event generation to executive business dashboard.

This project models realistic SaaS behavior and implements a layered warehouse architecture (Bronze → Silver → Gold) using industry-standard tools.


📌 Project Overview

This system simulates a subscription-based SaaS platform (Spotify / Swiggy-style behavioral modeling) and builds an end-to-end analytics stack including:

  • Large-scale synthetic event generation (~900K+ events)
  • Subscription lifecycle simulation (6 months)
  • Recurring billing system
  • Churn probability modeling
  • Retention cohort modeling
  • KPI data marts
  • Executive analytics dashboard

The goal is to demonstrate real-world Data Engineering practices such as:

  • Event-driven architecture
  • Star schema modeling
  • Warehouse layering
  • Analytical KPI marts
  • Production-style project organization

🏗 System Architecture

Python Event Generator
        ↓
PostgreSQL (Bronze Layer - Raw Events)
        ↓
dbt Transformations (Silver Layer - Star Schema)
        ↓
Gold Layer KPI Data Marts
        ↓
Streamlit Business Dashboard

📊 Project Scale & Complexity

  • ~900,000+ simulated user events
  • 6-month subscription lifecycle modeling
  • Multi-cohort retention tracking
  • Recurring monthly & yearly billing simulation
  • Payment failure probability
  • Reactivation behavior modeling
  • Fully normalized star schema warehouse

🛠 Tech Stack

Layer Tool Purpose
Event Generation Python Synthetic event simulation
Data Warehouse PostgreSQL Central analytics warehouse
Transformations dbt Staging, star schema, KPI marts
Visualization Streamlit Business dashboard

🧠 Data Warehouse Design

🥉 Bronze Layer (Raw Events)

Append-only immutable event store.

Table:

  • raw_user_events
    • UUID primary key
    • Event timestamp
    • Event type
    • JSONB event_properties
    • User identifier

Design Principles:

  • No transformations
  • Fully replayable
  • Audit-friendly

🥈 Silver Layer (Star Schema)

Staging

  • stg_raw_user_events

Dimension Tables

  • dim_users
  • dim_plan
  • dim_date
  • dim_event_type

Fact Tables

  • fact_user_activity
  • fact_payments
  • fact_subscriptions

Design Principles:

  • Surrogate keys
  • Clear fact-dimension separation
  • Optimized for analytical joins

🥇 Gold Layer (Business KPI Marts)

Pre-aggregated business-ready models:

  • kpi_dau
  • kpi_mau
  • kpi_mrr
  • kpi_churn
  • kpi_ltv
  • kpi_retention_cohort

Optimized for dashboard consumption.


📊 KPIs Implemented

👥 User Metrics

  • Daily Active Users (DAU)
  • Monthly Active Users (MAU)
  • Stickiness Ratio (DAU / MAU)

💰 Revenue Metrics

  • Monthly Recurring Revenue (MRR)
  • Customer Lifetime Value (LTV)
  • Subscription distribution

🔁 Retention & Churn

  • Monthly churn rate
  • Cohort retention matrix
  • Reactivation analysis

📊 Dashboard Capabilities

The Streamlit dashboard includes:

  • Executive KPI summary cards
  • DAU & MAU growth trends
  • Revenue trend (MRR over time)
  • Churn breakdown (absolute + rate)
  • Retention cohort heatmap
  • LTV distribution histogram
  • Architecture overview section

🚀 How To Run Locally

1️⃣ Generate Events

python generator/event_generator.py

2️⃣ Run dbt Transformations

cd dbt
dbt run

3️⃣ Launch Dashboard

streamlit run dashboard/app.py

📁 Project Structure

event-driven-saas-analytics/

├── generator/
│   |── behaviour_engine.py
│   ├── config.py
│   ├── enums.py
│   ├── main.py
│   ├── postgres_writer.py
│   ├── revenue_engine.py
│   └── models.py
|
├── dbt/
│   ├── models/
│   │   ├── staging/
│   │   ├── marts/
│   │   ├── dimensions/
│   │   ├── facts/
│   │   └── intermediate/
│
├── db/
│   ├── indexes.sql/
│   └── raw_schema.sql/
│
├── dashboard/
│   └── app.py
│   └── database.py
│   └── queries.py
│
└── README.md

🧪 Business Logic Simulation

The synthetic generator models:

  • User acquisition growth curve
  • Signup → feature usage funnel
  • Upgrade probability modeling
  • Monthly vs yearly subscription logic
  • Recurring billing cycles
  • Churn after minimum subscription tenure
  • Reactivation probability
  • Payment failure scenarios

Designed to mirror realistic SaaS growth dynamics.


🎯 Engineering Concepts Demonstrated

  • Event-driven system modeling
  • Star schema warehouse design
  • Layered architecture (Bronze / Silver / Gold)
  • Analytical SQL modeling
  • KPI mart construction
  • Large-scale synthetic data handling
  • Production-style project structure

📈 Business Impact Simulation

This architecture enables:

  • Real-time KPI visibility
  • Subscription revenue tracking
  • Cohort-based retention analysis
  • Data-driven churn reduction strategy
  • Executive-level reporting readiness

🔮 Future Enhancements

Planned upgrades:

  • Apache Airflow orchestration
  • Docker containerization
  • Kafka streaming ingestion
  • Data quality tests in dbt
  • Incremental model optimization

📊 Dataset

  • ~900K+ events
  • 6-month lifecycle coverage
  • Subscription events
  • Payment transactions
  • Feature usage activity
  • Upgrade & churn transitions

Fully synthetic data generated using Python.


⭐ Project Goal

To design and implement a production-style, end-to-end analytics system that simulates real SaaS business operations and demonstrates modern Data Engineering practices at scale .

This project showcases:

  • Event-driven system design
  • Layered warehouse architecture (Bronze → Silver → Gold)
  • Star schema data modeling
  • KPI data mart construction
  • Large-scale synthetic data handling (~900K+ events)
  • Business-ready analytical reporting

👨‍💻 Author

Kavin Kishore
Delhi Technological University (DTU)

Built as a production-style Data Engineering project.

About

End-to-end event-driven SaaS analytics platform with ~900K simulated events. Implements Bronze–Silver–Gold warehouse architecture, star schema modeling, KPI data marts, and business dashboard using Python, PostgreSQL, dbt, and Streamlit.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages