Version 0.7.1 (Latest)
A data modelling language that bridges between human business intent, AI reasoning, and technical implementation.
MD-DDL is a Markdown-native standard for defining not just what data is, but what it means, where it comes from, and how it is governed — in a format that humans, AI agents, and compilers all work with directly.
The specification is at 1-Foundation.md, or point your AI at the single-file version: MD-DDL-Complete.md.
md-ddl is: AI‑native · Human‑friendly · Version‑controlled · Semantically rich · Ready for automation
---
config:
layout: elk
---
flowchart TD
SME[Subject Matter Experts]
Stewards[Data Stewards & Architects]
subgraph Sources["Source Layer"]
SM[Source Manifests]
TF[Transform Files]
SM --> TF
end
subgraph Model["Domain Layer"]
D[Domain Files<br/>Summary Tables + Diagrams]
E[Detail Files<br/>Entities · Relationships · Events]
D --> E
end
subgraph Agents["AI Agents"]
AO["Agent Ontology<br/>Discover · Design · Author"]
AR["Agent Regulation<br/>Audit · Monitor · Remediate"]
end
subgraph Outputs["Generated Artefacts"]
KG[Knowledge Graph & Lineage]
SC[Schemas<br/>3NF · Dimensional · Messaging]
GV[ETL/ELT Logic<br/>& Governance Rules]
end
Sources --> Model
Model --> Outputs
AO --> Model
AO --> Sources
AR --> Model
SME --> Agents
SME --> Model
SME --> Sources
Stewards --> Agents
Stewards --> Model
Outputs --> Agents
This creates a data ecosystem that is business‑friendly, steward‑friendly, tech‑friendly, and AI‑friendly.
MD-DDL ships with two purpose-built AI agents covering the full data management lifecycle.
Agent Ontology is the primary authoring interface. Describe a business process and Agent Ontology drives the conversation — interviewing subject matter experts, checking applicable industry standards, reasoning through modelling trade-offs, and producing a draft domain model with summary tables first and detail files only after human review. It also guides source mapping, helping source system SMEs author manifests and transform files that feed the canonical model.
Agent Regulation is the ongoing compliance layer. It audits domain and entity files against loaded regulatory frameworks (APRA, GDPR, Basel, FATF, and more), monitors for regulatory change, and produces structured gap reports with prioritised remediation across all three governance levels — domain metadata, entity governance blocks, and attribute-level PII flags.
MD‑DDL's plain Markdown and structured YAML acts as a shared language for these agents and for any LLM you point at the spec — agents can reason over the model, walk relationships to find logic gaps, and generate technical artefacts from the same files your business stakeholders review.
Don't model from scratch. MD-DDL aligns with global industry standards out of the box — native patterns for BIAN and ISO 20022, built-in compliance guidance for Basel (BCBS), APRA, RBNZ, and GDPR, and direct traceability from every entity and attribute to the standard or regulatory requirement that defines it.
Regulatory requirements and business rules are embedded in the data definition itself, not bolted on after the fact. PII classification, retention obligations, and breach notification requirements sit directly on the entities they govern. Business rules like "balance can never be less than zero" become visible Constraints that compile into automated data quality checks. The domain and relationship structure automatically maps how sensitive data flows, making impact analysis and breach notification (e.g., CPS 234) traceable by default.
MD-DDL separates the operational reality of source systems from the governed meaning of canonical data. Source Manifests declare what each source system produces and how it generates change. Transform Files map source fields to canonical attributes using a typed transformation vocabulary — direct maps, derivations, lookups, multi-source reconciliation, and conditional logic — encoding source idiosyncrasies where they belong, in the source layer, not the canonical model. The result is end-to-end lineage from raw source field to governed domain attribute, compilable into ETL/ELT logic without custom tooling.
An MD-DDL domain is a complete data product definition. The effort of modelling a domain is the effort of defining the data product — one document, not two. When you apply a Canonical modelling strategy, the domain becomes a Foundational Data Product — a governed, reusable asset that other teams and systems consume without redefining.
| MD-DDL | Data Product concept |
|---|---|
| Domain | Data product definition |
| Canonical domain | Foundational / platform data product |
| Bounded context domain | Team-owned data product |
| Entities + relationships | Data product schema / semantic model |
| Events | Output port change events |
| Source manifests | Source-aligned input contracts |
| Transform files | Integration logic — source fields to canonical attributes |
| Governance metadata | Data product SLA — classification, retention, PII, residency |
| Owners + stewards | Data product owner and domain team |
| Generated schemas | Data product output ports (3NF, dimensional, messaging) |
Looking ahead to v0.8: The spec will formalise inter-domain consumption contracts and explicit output port declarations, making domain-to-domain data product relationships as governed as source-to-canonical ones.
Agent Ontology interviews your subject matter experts and proposes candidate entities, relationships, and events — checking applicable industry standards and surfacing modelling trade-offs before writing a line of MD-DDL.
Agent Ontology drafts domain summary tables first — a compact index of every concept in the domain. Detail files follow after human review, containing entity definitions, constraints, governance metadata, and diagrams.
Source system SMEs author Source Manifests declaring what their system produces and how it generates change. Transform Files map source fields to canonical attributes — encoding source-specific logic (type casts, null handling, derivations, lookups, multi-source reconciliation) in the source layer where it belongs. The canonical model stays pure.
Point any LLM at your MD-DDL files and instruct it to generate artefacts — no custom tooling required:
- Knowledge Graph — a queryable semantic web with end-to-end lineage from source field to canonical attribute
- Schemas — 3rd Normal Form, dimensional models, columnar layouts, and messaging schemas
- ETL/ELT logic — source-to-canonical pipelines derived directly from transform files
- Governance artefacts — data quality rules, lineage maps, and regulatory reports
Agent Regulation audits your model against applicable regulatory frameworks, monitors for regulatory change, and produces gap reports with specific remediation steps — running continuously against the living model.
MD-DDL is a dependency of your modelling project, not an artifact of it. Your domain and source files are the artifacts — the spec and agents are the tools you use to create and govern them.
The recommended approach is a git submodule — the closest equivalent to pip install for a Markdown-based standard: pinned to a version, updated independently of your model files, never duplicated.
git submodule add https://github.com/[org]/md-ddl .md-ddl
git submodule update --initTo update to a new version later: git submodule update --remote .md-ddl
Create .github/copilot-instructions.md in your project root:
## MD-DDL Standard
This project uses MD-DDL for data modelling. The standard and agents are in `.md-ddl/`.
- Full specification: `.md-ddl/md-ddl-specification/MD-DDL-Complete.md`
- Agent Ontology (discovery, design, source mapping): `.md-ddl/agents/agent-ontology/AGENT.md`
- Agent Regulation (compliance and audit): `.md-ddl/agents/agent-regulation/AGENT.md`
When working on domain, entity, or source files, read the relevant agent prompt and
spec sections before making changes. Draft domain summary tables before detail files.
Canonical entity files contain no source references — source mappings live in sources/.Create CLAUDE.md in your project root:
## MD-DDL Standard
This project uses MD-DDL for data modelling. The standard and agents are in `.md-ddl/`.
- Full specification: `.md-ddl/md-ddl-specification/MD-DDL-Complete.md`
- Agent Ontology (discovery, design, source mapping): `.md-ddl/agents/agent-ontology/AGENT.md`
- Agent Regulation (compliance and audit): `.md-ddl/agents/agent-regulation/AGENT.md`
When working on domain, entity, or source files, load the relevant agent prompt first.
Canonical entity files contain no source references — source mappings live in sources/.Add to your Project Knowledge:
MD-DDL-Complete.md— the full specificationagents/agent-ontology/AGENT.md— for modelling and source mappingagents/agent-regulation/AGENT.md— for compliance auditing
No submodule needed, but re-upload when moving to a newer version of the standard.
your-project/
.md-ddl/ ← submodule — the standard (not yours to edit)
.github/
copilot-instructions.md ← or CLAUDE.md
domains/
customer/
domain.md ← canonical model
entities/
customer.md
account.md
payments/
domain.md
sources/
salesforce/
manifest.md ← what Salesforce produces + change model
transforms/
customer.md ← Salesforce → Customer field mappings
sap/
manifest.md
transforms/
customer.md ← SAP's contribution to Customer
The .md-ddl/ directory is a read-only dependency. Your modelling work lives entirely outside it.
Domain layer — what the business means
- Domains — the highest level of organisation and the unit of a data product. A canonical domain is a foundational data product.
- Entities — the persistent nouns of your business (Customer, Account, Product)
- Relationships — semantic connections between entities (Customer Holds Account)
- Events — point-in-time business occurrences (Customer Onboarded, Transaction Executed)
- Enumerations — controlled vocabularies (Country Code, Loyalty Tier)
- Attributes — field definitions: data types, identifiers, PII flags
- Semantic Inheritance — specialised concepts inherit logic and governance from parents
- Constraints — formalised business rules that compile into data quality checks
- Temporal Tracking — how each entity changes over time: immutable, append-only, slowly changing, or bitemporal
- Existence & Mutability — entity-level declarations that drive compiler output for dimensional modelling
Source layer — where data comes from
- Source Manifests — declare what a source system produces, how it generates change, and which canonical entities it contributes to
- Transform Files — map source fields to canonical attributes using a typed vocabulary: direct, derived, lookup, reconciliation, conditional, and aggregation. Source idiosyncrasies stay here, away from the canonical model.
Governance layer — how data is protected
- Data Governance — PII, classification, retention, residency, and breach notification embedded on the entities they govern
- Regulatory Scope — every domain and entity declares which frameworks apply
- Ownership & Lineage — data owners, stewards, and the full lineage graph from source field to canonical attribute
Native diagramming via embedded Mermaid — domain overview graphs and entity class diagrams live directly alongside the definitions they represent, not in a separate tool.
md-ddl-specification/ The normative standard
1-Foundation.md Core principles and document structure
MD-DDL-Complete.md Single-file version for AI context loading
... Individual section files (2–8)
agents/
agent-ontology/ Discovery, design, and source mapping agent
agent-regulation/ Regulatory compliance and audit agent
examples/
Financial Crime/ Reference-quality domain with full entity detail files
Simple Customer/ Minimal example — single detail file, good starting point
This work is licensed under a Creative Commons Attribution 4.0 International License.
