Skip to content

Conversation

@mrhallak
Copy link

@mrhallak mrhallak commented Jan 9, 2026

Summary

  • Add new databricks_dlt task type for triggering Databricks DLT pipelines via Jobs
  • Implement DatabricksDLTCreator operator creator using DatabricksRunNowOperator
  • Add DLTTaskGenerator plugin to auto-generate Dagger task configs from Databricks Asset Bundles
  • Extend DatabricksDBTConfigParser to support Unity Catalog source dependencies
  • Add CLAUDE.md with project documentation for AI assistants

Implementation Details

New Task Type: databricks_dlt

  • Location: dagger/pipeline/tasks/databricks_dlt_task.py
  • Configurable parameters:
    • job_name: Databricks Job name that triggers the DLT pipeline
    • databricks_conn_id: Airflow connection ID (default: databricks_default)
    • wait_for_completion: Whether to wait for job completion (default: true)
    • poll_interval_seconds: Polling interval (default: 30)
    • timeout_seconds: Timeout (default: 3600)
    • cancel_on_kill: Cancel job if Airflow task is killed (default: true)

Operator Creator: DatabricksDLTCreator

  • Location: dagger/dag_creator/airflow/operator_creators/databricks_dlt_creator.py
  • Uses DatabricksRunNowOperator to trigger Databricks Jobs that wrap DLT pipelines
  • Retrieves job ID dynamically by job name using the Databricks Jobs API

DLT Task Generator Plugin

  • Location: dagger/plugins/dlt_task_generator/
  • DatabricksBundleParser: Parses databricks.yml and tables.yml from Databricks Asset Bundles
  • DLTTaskGenerator: Generates complete Dagger task and pipeline configurations
  • Automatically extracts inputs (Athena changelog tables) and outputs (Databricks silver tables)

DatabricksDBTConfigParser Changes

The dbt config parser was extended to support Unity Catalog source dependencies. This is required because dbt models can now depend on DLT output tables that reside in Unity Catalog, not just legacy Hive metastore tables accessible via Athena.

Problem: Previously, all dbt sources were treated as Athena tables. With DLT pipelines writing to Unity Catalog, dbt models that read from DLT outputs need to create Databricks input tasks (not Athena tasks) to properly track lineage and dependencies.

Solution: The parser now distinguishes between:

  • Legacy Hive metastore sources (database = hive_metastore): Continue using Athena input tasks
  • Unity Catalog sources (other databases like ${ENV_MARTS}): Create Databricks input tasks

Key changes:

  • Added LEGACY_HIVE_DATABASES constant to identify legacy sources
  • Added _is_databricks_source() method to detect Unity Catalog tables
  • Added _get_databricks_source_task() to generate Databricks input task configs
  • Overrode _generate_dagger_tasks() to route sources to the appropriate task type

Test plan

  • Verify databricks_dlt task type is registered and discoverable via dagger list-tasks
  • Test DLT task YAML parsing with sample configurations
  • Validate DLTTaskGenerator correctly parses Databricks Asset Bundles
  • Run unit tests: make test
  • Run linting: make lint

Required for the new DLT creator which uses DatabricksRunNowOperator
and DatabricksHook from apache-airflow-providers-databricks.
Required for DLT creator to work in production.
@mrhallak
Copy link
Author

@claude

@mrhallak mrhallak changed the title Implement DLT creator for Databricks Delta Live Tables DATA-2637 Implement DLT creator for Databricks Delta Live Tables Jan 12, 2026
- Add type hints and docstrings to DatabricksIO
- Improve error handling in DatabricksDLTCreator with ImportError support
- Add validation for empty job_name in DatabricksDLTCreator
- Add comprehensive test coverage for DatabricksIO, DatabricksDLTTask, and DatabricksDLTCreator
- All Databricks components now have 100% test coverage
Prefer explicit properties over getattr for type safety and better IDE support.
@mrhallak
Copy link
Author

@claude

@mrhallak mrhallak requested a review from siklosid January 13, 2026 16:25
@mrhallak mrhallak marked this pull request as ready for review January 13, 2026 17:56
@mrhallak mrhallak requested a review from a team as a code owner January 13, 2026 17:56
@mrhallak mrhallak merged commit 596a1fb into master Jan 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants