Proposal: Native support for OneLake file-based sources via OPENROWSET (Fabric SQL)

Hi maintainers 👋,

I’m working with dbt on Microsoft Fabric and ran into a recurring gap when working with file-based data using dbt-core.

**Problem**

In Fabric Warehouse, reading files from OneLake via OPENROWSET is fast, maybe even fastest way to onboard file data into warehouse tables without the need for shortcuts to reach files cross-workspace. 

Today, this requires either:

- embedding raw OPENROWSET SQL in models, or
- bypassing dbt source() semantics entirely.

This makes it hard to:

- keep models clean
- centralize file metadata (paths, CSV options, column types)
- treat file-based inputs as first-class dbt sources

**Goal**

Explore whether the Fabric adapter could support file-based sources in OneLake in a dbt-native way, while still compiling to valid Fabric SQL.

**Concretely:**

- allow defining OneLake file metadata in sources.yml
- generate OPENROWSET statements automatically
- keep model SQL clean and declarative

**Prototype approach**

I’ve built a small prototype using macros (no adapter changes yet) that:

- defines OneLake paths and CSV options in sources.yml (meta)
- uses column definitions from the source to generate the WITH (...) clause to include.
- compiles to Fabric-compatible SQL using OPENROWSET
- supports reuse across years/files with minimal model code

Example model usage:

select *
from {{ source_onelake('bab', 'percelen_2016') }}

Example source metadata (simplified):

sources:
  - name: bab
    meta:
      onelake_path: https://onelake.dfs.fabric.microsoft.com/<workspace>/<lakehouse>/Files
    tables:
      - name: percelen_2016
        meta:
          csv_file: PERCELEN_2016.csv
          csv_folder: /BAB2016
          field_terminator: ';'
          first_row: 2
        columns:
          - name: wkt
            data_type: varchar(max)
          - name: objectid
            data_type: int


This compiles to:

select *
from OPENROWSET(
    BULK '<onelake_path>/4_Productie/BAB2016/PERCELEN_2016.csv',
    FORMAT = 'CSV',
    PARSER_VERSION = '2.0',
    FIRSTROW = 2,
    FIELDTERMINATOR = ';'
)
WITH (
    wkt varchar(max),
    objectid int
) as csv_data

**Questions for maintainers**

Before investing further, I’d love guidance on:
Is this a direction that would be acceptable for the Fabric adapter?

**Would you prefer:**

- adapter-level support (treating file sources as real source() relations), or
- a separate macro/package approach first?
- Are there design constraints in the adapter I should be aware of (e.g. relation typing, source parsing, SQL generation)?

**Happy to:**

- share the prototype macro code
- reduce this to a minimal reproducible example
- open a draft PR or separate package if that’s preferred
- Thanks for your time and for maintaining the adapter 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Native support for OneLake file-based sources via OPENROWSET (Fabric SQL) #347

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Native support for OneLake file-based sources via OPENROWSET (Fabric SQL) #347

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions