-
Notifications
You must be signed in to change notification settings - Fork 127
Description
Describe the issue
When developing a Databricks Job, it is good practice to only execute the tasks that got updated. Oftentimes upstream or downstream tasks have to be executed as well to reproduce the pipeline reliably.
Currently, Databricks CLI supports executing specific tasks in a Job by means of the --only flag but for some reason there is no native way to automatically include upstream or downstream tasks of the selected tasks. However, via the UI, it is possible to do this:
It'd be really great to have a flag --include-downstream/--include-upstream to do just that. Ideally, it would be possible to include upstream/downstream tasks on a per-task basis such as dbt does with its + operator (dbt run --select +model_name+), but adding a general flag that would do this for all of the tasks selected using the --only flag would be a great addition as it is.
Some examples with expected behaviors:
databricks bundle run job_name --only task1,task8 --include-upstream
Would run task1 and its upstream tasks plus task8 and its upstream tasks
databricks bundle run job_name --only task1,task8 --include-downstream
Would run task1 and its downstream tasks plus task8 and its downstream tasks
databricks bundle run job_name --only task1,task8 --include-upstream --include-downstream
Would run task1 and its upstream tasks and its downstream tasks plus task8 and its upstream tasks and its downstream tasks
Otherwise, mimicking dbt + operator would also work but I guess it would require far more effort to make happen.
OS and CLI version
Databricks CLI v0.282.0
Is this a regression?
No
Debug Logs
N/A