Skip to content

A production-ready, enterprise-grade data platform built on Microsoft Fabric with optional integrations for Microsoft Purview governance and Azure Databricks advanced analytics.

License

Notifications You must be signed in to change notification settings

PatrickGallucci/unified-data-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unified Data Platform Solution Accelerator

A production-ready, enterprise-grade data platform built on Microsoft Fabric with optional integrations for Microsoft Purview governance and Azure Databricks advanced analytics.

License: MIT Azure Fabric


Overview

The Unified Data Platform Solution Accelerator provides a complete, configurable data foundation using the medallion architecture pattern (Bronze, Silver, Gold). It enables organizations to rapidly deploy a governed, analytics-ready data platform with minimal configuration.

Key Capabilities

  • Medallion architecture with Bronze, Silver, and Gold lakehouses
  • 48 PySpark notebooks for automated data transformations
  • Pre-built Power BI semantic models and dashboards
  • Optional Microsoft Purview integration for enterprise governance
  • Optional Azure Databricks integration for advanced analytics
  • Multi-domain data models (Sales, Finance, Customer, Product)
  • Automated deployment via Azure Developer CLI (azd)

Architecture Options

The solution offers four deployment configurations based on organizational requirements:

Option Components Description
Option 1 Fabric + Power BI Core medallion architecture with analytics dashboards
Option 2 Option 1 + Purview Adds data governance, lineage tracking, and classification
Option 3 Option 1 + Databricks Adds advanced analytics and hybrid processing
Option 4 All Components Complete enterprise data platform

Option 1: Core Medallion Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Microsoft Fabric Workspace                   │
├─────────────────┬─────────────────┬─────────────────────────────┤
│  Bronze         │  Silver         │  Gold                       │
│  Lakehouse      │  Lakehouse      │  Lakehouse                  │
│  (Raw Data)     │  (Validated)    │  (Enriched)                 │
├─────────────────┴─────────────────┴─────────────────────────────┤
│  48 PySpark Notebooks  │  Semantic Models  │  Power BI Reports  │
└─────────────────────────────────────────────────────────────────┘

Option 4: Full Enterprise Architecture

┌──────────────────────────────────────────────────────────────────────────┐
│                         Microsoft Purview                                 │
│  (Unified Catalog, Data Map, Governance Domains, Lineage)                │
└────────────────────────────────┬─────────────────────────────────────────┘
                                 │
┌────────────────────────────────┴─────────────────────────────────────────┐
│                       Microsoft Fabric Workspace                          │
├─────────────────┬─────────────────┬──────────────────────────────────────┤
│  Bronze         │  Silver         │  Gold                                │
│  Lakehouse      │  Lakehouse      │  Lakehouse ←── Shortcuts/Mirroring   │
└─────────────────┴─────────────────┴──────────────┬───────────────────────┘
                                                   │
                                    ┌──────────────┴───────────────┐
                                    │    Azure Databricks          │
                                    │    Unity Catalog             │
                                    │    (Advanced Analytics)      │
                                    └──────────────────────────────┘

Repository Structure

unified-data-platform-solution-accelerator/
├── docs/
│   ├── DeploymentGuide.md              # Main deployment overview
│   ├── DeploymentGuideFabric.md        # Automated Fabric deployment
│   ├── DeploymentGuideFabricManual.md  # Manual Fabric deployment
│   ├── DeploymentGuidePowerBI.md       # Power BI configuration
│   ├── DeploymentGuidePurview.md       # Purview integration
│   ├── DeploymentGuideDatabricks.md    # Databricks deployment
│   ├── NotebooksGuideFabric.md         # Fabric notebooks reference
│   ├── NotebooksGuideDatabricks.md     # Databricks notebooks reference
│   ├── SetupPurview.md                 # Purview provisioning
│   ├── SetupDatabricks.md              # Databricks provisioning
│   ├── TechnicalArchitecture.md        # Architecture details
│   ├── LocalDevelopmentSetup.md        # Local dev environment
│   ├── QuotaCheck.md                   # Azure quota verification
│   └── SampleWorkflow.md               # Validation workflow
├── infra/
│   ├── scripts/
│   │   ├── fabric/                     # Fabric deployment scripts
│   │   ├── databricks/                 # Databricks deployment scripts
│   │   └── utils/                      # Utility scripts
│   └── bicep/                          # Infrastructure as Code
├── src/
│   ├── fabric/
│   │   └── notebooks/
│   │       ├── bronze_to_silver/       # 16 transformation notebooks
│   │       ├── silver_to_gold/         # 16 aggregation notebooks
│   │       ├── schema/                 # 8 schema definition notebooks
│   │       ├── data_management/        # 5 utility notebooks
│   │       ├── run_bronze_to_silver.ipynb
│   │       └── run_silver_to_gold.ipynb
│   └── databricks/
│       └── notebooks/
│           ├── bronze_to_adb_silver/   # 3 data loading notebooks
│           ├── schema/                 # 2 schema notebooks
│           ├── data_management/        # 2 cleanup notebooks
│           └── run_bronze_to_adb.ipynb
├── reports/
│   └── UDPLZ_SalesDashboard.pbix        # Power BI dashboard
├── data/
│   └── samples/                        # Sample CSV data files
├── azure.yaml                          # Azure Developer CLI config
└── README.md

Prerequisites

Required

  • Azure Subscription with Owner or Contributor access
  • Microsoft Fabric Capacity (F2 or higher recommended)
  • Azure CLI (v2.50+)
  • Python 3.9+
  • Git

Optional (Based on Deployment Option)

  • Microsoft Purview Account (Option 2, 4)
  • Azure Databricks Workspace - Premium tier (Option 3, 4)
  • Power BI Pro or Premium Per User license

Permissions

Component Required Permission
Fabric Workspace Admin on target capacity
Purview Data Curator, Collection Admin
Databricks Workspace Admin, Unity Catalog privileges
Azure Resource Group Contributor

Quick Start

Step 1: Check Azure Quota

Before deployment, verify quota availability:

# Clone the repository
git clone https://github.com/PatrickGallucci/unified-data-platform.git
cd unified-data-platform-solution-accelerator

# Run quota check
cd infra/scripts
chmod +x quota_check_params.sh
./quota_check_params.sh --models gpt-4o-mini:150 --regions eastus,westus

Step 2: Authenticate

# Azure CLI
az login
az account set --subscription "<your-subscription-id>"

# Azure Developer CLI (for automated deployment)
azd auth login

Step 3: Deploy Option 1 (Fabric + Power BI)

Automated Deployment (Recommended)

# Set environment variables
export AZURE_FABRIC_CAPACITY_NAME="your-capacity-name"

# Deploy using azd
azd up

Manual Deployment

# Navigate to scripts directory
cd infra/scripts/utils

# Set environment variables
export AZURE_FABRIC_CAPACITY_NAME="your-capacity-name"
export AZURE_FABRIC_WORKSPACE_NAME="UDPLZ Data Platform Workspace"

# Run deployment script
pwsh ./run-python-script-fabric.ps1

Step 4: Configure Power BI (Post-Deployment)

  1. Open the udplz_gold lakehouse in Fabric
  2. Navigate to Lakehouse settings and copy the SQL analytics endpoint
  3. Open the deployed Power BI report and verify the connection

For detailed instructions, see Power BI Deployment Guide.


Deployment Guides

Guide Description
Deployment Overview Main deployment guide with all options
Fabric Automated One-command deployment using azd
Fabric Manual Step-by-step manual deployment
Power BI Setup Dashboard configuration
Purview Integration Data governance setup
Databricks Integration Advanced analytics setup
Local Development Development environment

Data Model

The solution includes pre-built data models across multiple business domains:

Schemas

Schema Domain Tables
shared Master Data customer, product
sales Sales Operations order, orderline, orderpayment
salesfabric Sales (Gold) order, orderline, orderpayment
salesadb Sales (Databricks) order, orderline, orderpayment
finance Financial Data (extensible)

Notebook Inventory

Fabric Notebooks (48 total)

Category Count Purpose
Bronze to Silver 16 Load CSV to validated tables
Silver to Gold 16 Transform and enrich data
Schema 8 Define data models
Data Management 5 Truncate, drop, utilities
Runners 2 Orchestration
Test/Sample 1 Analysis examples

Databricks Notebooks (8 total)

Category Count Purpose
Bronze to ADB Silver 3 Load data to Unity Catalog
Schema 2 Model and permissions
Data Management 2 Cleanup utilities
Runner 1 Orchestration

Power BI Dashboard

The included Power BI dashboard provides immediate business insights:

  • YOY Net Sales Comparison - Trend analysis across years
  • Revenue by Customer Segment - Individual, Business, Government breakdown
  • Top Products by Revenue - Product performance ranking
  • Top Products by Quantity - Volume analysis
  • Sales by Gender - Demographic distribution

Connection Configuration

The dashboard connects to the Gold lakehouse via SQL analytics endpoint:

Server: <workspace-name>.datawarehouse.fabric.microsoft.com
Database: udplz_gold
Authentication: Microsoft Entra ID

Extending the Solution

Adding New Domains

  1. Create schema notebook in src/fabric/notebooks/schema/:
# model_<domain>_gold.ipynb
spark.sql("CREATE SCHEMA IF NOT EXISTS <domain>")
spark.sql("""
    CREATE TABLE IF NOT EXISTS <domain>.<table> (
        id STRING,
        name STRING,
        created_at TIMESTAMP
    )
""")
  1. Create bronze-to-silver notebook in src/fabric/notebooks/bronze_to_silver/
  2. Create silver-to-gold notebook in src/fabric/notebooks/silver_to_gold/
  3. Register notebooks in runner notebooks

Adding New Data Sources

  1. Upload source files to Bronze lakehouse Files/samples_fabric/<domain>/
  2. Create corresponding transformation notebooks
  3. Update runner notebooks to include new transformations

Validation

After deployment, validate the solution using the Sample Workflow:

Quick Validation Checklist

  • Fabric workspace contains 4 folders (lakehouses, notebooks, reports, databricks)
  • Three lakehouses created: udplz_bronze, udplz_silver, udplz_gold
  • 48 notebooks deployed in organized folder structure
  • Sample CSV data loaded in Bronze lakehouse
  • Run run_bronze_to_silver notebook successfully
  • Run run_silver_to_gold notebook successfully
  • Power BI report displays data correctly
  • (Option 2) Purview scan discovers Fabric assets
  • (Option 3) Databricks mirrored catalog accessible in Fabric

SQL Validation

-- Verify table counts in Gold lakehouse
SELECT COUNT(*) FROM [udplz_gold].[salesfabric].[order];
SELECT COUNT(*) FROM [udplz_gold].[salesfabric].[orderline];
SELECT COUNT(*) FROM [udplz_gold].[salesfabric].[orderpayment];

Troubleshooting

Issue Likely Cause Resolution
Deployment script fails Missing prerequisites Verify Azure CLI, Python, permissions
Capacity not found Wrong capacity name Run az fabric capacity list to verify
Notebook execution fails Lakehouse not attached Attach default lakehouse in notebook
Power BI connection error Wrong SQL endpoint Verify endpoint in lakehouse settings
Purview scan fails API permissions Enable admin APIs in Fabric tenant settings
Databricks mirror fails External data access Enable in Databricks metastore settings

For detailed troubleshooting, see individual deployment guides.


Security Considerations

Data Protection

  • Enable sensitivity labels in Microsoft Purview
  • Configure row-level security in Power BI semantic models
  • Use managed identities for service authentication
  • Implement network isolation with private endpoints

Access Control

  • Use Microsoft Entra ID for authentication
  • Implement least-privilege access via Fabric workspace roles
  • Configure Unity Catalog permissions for Databricks
  • Enable audit logging in all services

Compliance

  • Microsoft Purview DLP policies for Power BI semantic models
  • Data classification and sensitivity labeling
  • Lineage tracking for regulatory requirements
  • Retention policies for data lifecycle management

Contributing

We welcome contributions to the Unified Data Platform Solution Accelerator.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-capability)
  3. Commit changes (git commit -m 'Add new capability')
  4. Push to branch (git push origin feature/new-capability)
  5. Open a Pull Request

Please ensure:

  • Code follows existing patterns and naming conventions
  • Notebooks include documentation cells
  • Updates to architecture require corresponding documentation updates
  • All secrets and credentials are parameterized, never hardcoded

Resources

Microsoft Documentation

Community


License

This project is licensed under the MIT License - see the LICENSE file for details.


Support

For issues and feature requests, please use GitHub Issues.


Maintained by Microsoft | Report an Issue | Request a Feature

About

A production-ready, enterprise-grade data platform built on Microsoft Fabric with optional integrations for Microsoft Purview governance and Azure Databricks advanced analytics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published