-
Notifications
You must be signed in to change notification settings - Fork 44
Description
Hi maintainers 👋,
I’m working with dbt on Microsoft Fabric and ran into a recurring gap when working with file-based data using dbt-core.
Problem
In Fabric Warehouse, reading files from OneLake via OPENROWSET is fast, maybe even fastest way to onboard file data into warehouse tables without the need for shortcuts to reach files cross-workspace.
Today, this requires either:
- embedding raw OPENROWSET SQL in models, or
- bypassing dbt source() semantics entirely.
This makes it hard to:
- keep models clean
- centralize file metadata (paths, CSV options, column types)
- treat file-based inputs as first-class dbt sources
Goal
Explore whether the Fabric adapter could support file-based sources in OneLake in a dbt-native way, while still compiling to valid Fabric SQL.
Concretely:
- allow defining OneLake file metadata in sources.yml
- generate OPENROWSET statements automatically
- keep model SQL clean and declarative
Prototype approach
I’ve built a small prototype using macros (no adapter changes yet) that:
- defines OneLake paths and CSV options in sources.yml (meta)
- uses column definitions from the source to generate the WITH (...) clause to include.
- compiles to Fabric-compatible SQL using OPENROWSET
- supports reuse across years/files with minimal model code
Example model usage:
select *
from {{ source_onelake('bab', 'percelen_2016') }}
Example source metadata (simplified):
sources:
- name: bab
meta:
onelake_path: https://onelake.dfs.fabric.microsoft.com///Files
tables:- name: percelen_2016
meta:
csv_file: PERCELEN_2016.csv
csv_folder: /BAB2016
field_terminator: ';'
first_row: 2
columns:- name: wkt
data_type: varchar(max) - name: objectid
data_type: int
- name: wkt
- name: percelen_2016
This compiles to:
select *
from OPENROWSET(
BULK '<onelake_path>/4_Productie/BAB2016/PERCELEN_2016.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIRSTROW = 2,
FIELDTERMINATOR = ';'
)
WITH (
wkt varchar(max),
objectid int
) as csv_data
Questions for maintainers
Before investing further, I’d love guidance on:
Is this a direction that would be acceptable for the Fabric adapter?
Would you prefer:
- adapter-level support (treating file sources as real source() relations), or
- a separate macro/package approach first?
- Are there design constraints in the adapter I should be aware of (e.g. relation typing, source parsing, SQL generation)?
Happy to:
- share the prototype macro code
- reduce this to a minimal reproducible example
- open a draft PR or separate package if that’s preferred
- Thanks for your time and for maintaining the adapter 🙏