A production-style Data Engineering project that simulates a real subscription-based SaaS product and builds a complete modern analytics pipeline — from large-scale event generation to executive business dashboard.
This project models realistic SaaS behavior and implements a layered warehouse architecture (Bronze → Silver → Gold) using industry-standard tools.
This system simulates a subscription-based SaaS platform (Spotify / Swiggy-style behavioral modeling) and builds an end-to-end analytics stack including:
- Large-scale synthetic event generation (~900K+ events)
- Subscription lifecycle simulation (6 months)
- Recurring billing system
- Churn probability modeling
- Retention cohort modeling
- KPI data marts
- Executive analytics dashboard
The goal is to demonstrate real-world Data Engineering practices such as:
- Event-driven architecture
- Star schema modeling
- Warehouse layering
- Analytical KPI marts
- Production-style project organization
Python Event Generator
↓
PostgreSQL (Bronze Layer - Raw Events)
↓
dbt Transformations (Silver Layer - Star Schema)
↓
Gold Layer KPI Data Marts
↓
Streamlit Business Dashboard
- ~900,000+ simulated user events
- 6-month subscription lifecycle modeling
- Multi-cohort retention tracking
- Recurring monthly & yearly billing simulation
- Payment failure probability
- Reactivation behavior modeling
- Fully normalized star schema warehouse
| Layer | Tool | Purpose |
|---|---|---|
| Event Generation | Python | Synthetic event simulation |
| Data Warehouse | PostgreSQL | Central analytics warehouse |
| Transformations | dbt | Staging, star schema, KPI marts |
| Visualization | Streamlit | Business dashboard |
Append-only immutable event store.
Table:
raw_user_events- UUID primary key
- Event timestamp
- Event type
- JSONB event_properties
- User identifier
Design Principles:
- No transformations
- Fully replayable
- Audit-friendly
stg_raw_user_events
dim_usersdim_plandim_datedim_event_type
fact_user_activityfact_paymentsfact_subscriptions
Design Principles:
- Surrogate keys
- Clear fact-dimension separation
- Optimized for analytical joins
Pre-aggregated business-ready models:
kpi_daukpi_maukpi_mrrkpi_churnkpi_ltvkpi_retention_cohort
Optimized for dashboard consumption.
- Daily Active Users (DAU)
- Monthly Active Users (MAU)
- Stickiness Ratio (DAU / MAU)
- Monthly Recurring Revenue (MRR)
- Customer Lifetime Value (LTV)
- Subscription distribution
- Monthly churn rate
- Cohort retention matrix
- Reactivation analysis
The Streamlit dashboard includes:
- Executive KPI summary cards
- DAU & MAU growth trends
- Revenue trend (MRR over time)
- Churn breakdown (absolute + rate)
- Retention cohort heatmap
- LTV distribution histogram
- Architecture overview section
python generator/event_generator.pycd dbt
dbt runstreamlit run dashboard/app.pyevent-driven-saas-analytics/
├── generator/
│ |── behaviour_engine.py
│ ├── config.py
│ ├── enums.py
│ ├── main.py
│ ├── postgres_writer.py
│ ├── revenue_engine.py
│ └── models.py
|
├── dbt/
│ ├── models/
│ │ ├── staging/
│ │ ├── marts/
│ │ ├── dimensions/
│ │ ├── facts/
│ │ └── intermediate/
│
├── db/
│ ├── indexes.sql/
│ └── raw_schema.sql/
│
├── dashboard/
│ └── app.py
│ └── database.py
│ └── queries.py
│
└── README.md
The synthetic generator models:
- User acquisition growth curve
- Signup → feature usage funnel
- Upgrade probability modeling
- Monthly vs yearly subscription logic
- Recurring billing cycles
- Churn after minimum subscription tenure
- Reactivation probability
- Payment failure scenarios
Designed to mirror realistic SaaS growth dynamics.
- Event-driven system modeling
- Star schema warehouse design
- Layered architecture (Bronze / Silver / Gold)
- Analytical SQL modeling
- KPI mart construction
- Large-scale synthetic data handling
- Production-style project structure
This architecture enables:
- Real-time KPI visibility
- Subscription revenue tracking
- Cohort-based retention analysis
- Data-driven churn reduction strategy
- Executive-level reporting readiness
Planned upgrades:
- Apache Airflow orchestration
- Docker containerization
- Kafka streaming ingestion
- Data quality tests in dbt
- Incremental model optimization
- ~900K+ events
- 6-month lifecycle coverage
- Subscription events
- Payment transactions
- Feature usage activity
- Upgrade & churn transitions
Fully synthetic data generated using Python.
To design and implement a production-style, end-to-end analytics system that simulates real SaaS business operations and demonstrates modern Data Engineering practices at scale .
This project showcases:
- Event-driven system design
- Layered warehouse architecture (Bronze → Silver → Gold)
- Star schema data modeling
- KPI data mart construction
- Large-scale synthetic data handling (~900K+ events)
- Business-ready analytical reporting
Kavin Kishore
Delhi Technological University (DTU)
Built as a production-style Data Engineering project.