Skip to content

Proposal: Native support for OneLake file-based sources via OPENROWSET (Fabric SQL) #347

@martjan

Description

@martjan

Hi maintainers 👋,

I’m working with dbt on Microsoft Fabric and ran into a recurring gap when working with file-based data using dbt-core.

Problem

In Fabric Warehouse, reading files from OneLake via OPENROWSET is fast, maybe even fastest way to onboard file data into warehouse tables without the need for shortcuts to reach files cross-workspace.

Today, this requires either:

  • embedding raw OPENROWSET SQL in models, or
  • bypassing dbt source() semantics entirely.

This makes it hard to:

  • keep models clean
  • centralize file metadata (paths, CSV options, column types)
  • treat file-based inputs as first-class dbt sources

Goal

Explore whether the Fabric adapter could support file-based sources in OneLake in a dbt-native way, while still compiling to valid Fabric SQL.

Concretely:

  • allow defining OneLake file metadata in sources.yml
  • generate OPENROWSET statements automatically
  • keep model SQL clean and declarative

Prototype approach

I’ve built a small prototype using macros (no adapter changes yet) that:

  • defines OneLake paths and CSV options in sources.yml (meta)
  • uses column definitions from the source to generate the WITH (...) clause to include.
  • compiles to Fabric-compatible SQL using OPENROWSET
  • supports reuse across years/files with minimal model code

Example model usage:

select *
from {{ source_onelake('bab', 'percelen_2016') }}

Example source metadata (simplified):

sources:

  • name: bab
    meta:
    onelake_path: https://onelake.dfs.fabric.microsoft.com///Files
    tables:
    • name: percelen_2016
      meta:
      csv_file: PERCELEN_2016.csv
      csv_folder: /BAB2016
      field_terminator: ';'
      first_row: 2
      columns:
      • name: wkt
        data_type: varchar(max)
      • name: objectid
        data_type: int

This compiles to:

select *
from OPENROWSET(
BULK '<onelake_path>/4_Productie/BAB2016/PERCELEN_2016.csv',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIRSTROW = 2,
FIELDTERMINATOR = ';'
)
WITH (
wkt varchar(max),
objectid int
) as csv_data

Questions for maintainers

Before investing further, I’d love guidance on:
Is this a direction that would be acceptable for the Fabric adapter?

Would you prefer:

  • adapter-level support (treating file sources as real source() relations), or
  • a separate macro/package approach first?
  • Are there design constraints in the adapter I should be aware of (e.g. relation typing, source parsing, SQL generation)?

Happy to:

  • share the prototype macro code
  • reduce this to a minimal reproducible example
  • open a draft PR or separate package if that’s preferred
  • Thanks for your time and for maintaining the adapter 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions