forked from siklosid/dagger
-
Notifications
You must be signed in to change notification settings - Fork 1
DATA-2637 Implement DLT creator for Databricks Delta Live Tables #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Required for the new DLT creator which uses DatabricksRunNowOperator and DatabricksHook from apache-airflow-providers-databricks.
Required for DLT creator to work in production.
siklosid
reviewed
Jan 9, 2026
dagger/dag_creator/airflow/operator_creators/databricks_dlt_creator.py
Outdated
Show resolved
Hide resolved
Author
- Add type hints and docstrings to DatabricksIO - Improve error handling in DatabricksDLTCreator with ImportError support - Add validation for empty job_name in DatabricksDLTCreator - Add comprehensive test coverage for DatabricksIO, DatabricksDLTTask, and DatabricksDLTCreator - All Databricks components now have 100% test coverage
Prefer explicit properties over getattr for type safety and better IDE support.
Author
siklosid
approved these changes
Jan 13, 2026
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
databricks_dlttask type for triggering Databricks DLT pipelines via JobsDatabricksDLTCreatoroperator creator usingDatabricksRunNowOperatorDLTTaskGeneratorplugin to auto-generate Dagger task configs from Databricks Asset BundlesDatabricksDBTConfigParserto support Unity Catalog source dependenciesCLAUDE.mdwith project documentation for AI assistantsImplementation Details
New Task Type:
databricks_dltdagger/pipeline/tasks/databricks_dlt_task.pyjob_name: Databricks Job name that triggers the DLT pipelinedatabricks_conn_id: Airflow connection ID (default:databricks_default)wait_for_completion: Whether to wait for job completion (default:true)poll_interval_seconds: Polling interval (default:30)timeout_seconds: Timeout (default:3600)cancel_on_kill: Cancel job if Airflow task is killed (default:true)Operator Creator:
DatabricksDLTCreatordagger/dag_creator/airflow/operator_creators/databricks_dlt_creator.pyDatabricksRunNowOperatorto trigger Databricks Jobs that wrap DLT pipelinesDLT Task Generator Plugin
dagger/plugins/dlt_task_generator/DatabricksBundleParser: Parsesdatabricks.ymlandtables.ymlfrom Databricks Asset BundlesDLTTaskGenerator: Generates complete Dagger task and pipeline configurationsDatabricksDBTConfigParser Changes
The dbt config parser was extended to support Unity Catalog source dependencies. This is required because dbt models can now depend on DLT output tables that reside in Unity Catalog, not just legacy Hive metastore tables accessible via Athena.
Problem: Previously, all dbt sources were treated as Athena tables. With DLT pipelines writing to Unity Catalog, dbt models that read from DLT outputs need to create Databricks input tasks (not Athena tasks) to properly track lineage and dependencies.
Solution: The parser now distinguishes between:
database = hive_metastore): Continue using Athena input tasks${ENV_MARTS}): Create Databricks input tasksKey changes:
LEGACY_HIVE_DATABASESconstant to identify legacy sources_is_databricks_source()method to detect Unity Catalog tables_get_databricks_source_task()to generate Databricks input task configs_generate_dagger_tasks()to route sources to the appropriate task typeTest plan
databricks_dlttask type is registered and discoverable viadagger list-tasksDLTTaskGeneratorcorrectly parses Databricks Asset Bundlesmake testmake lint