Scale Dask worker group from Cluster spec#720
Conversation
jacobtomlinson
left a comment
There was a problem hiding this comment.
This is a really nicely written PR! Thanks for picking this up.
| async with kubernetes.client.api_client.ApiClient() as api_client: | ||
| custom_objects_api = kubernetes.client.CustomObjectsApi(api_client) | ||
| custom_objects_api.api_client.set_default_header( | ||
| "content-type", "application/merge-patch+json" | ||
| ) | ||
| await custom_objects_api.patch_namespaced_custom_object_scale( | ||
| group="kubernetes.dask.org", | ||
| version="v1", | ||
| plural="daskworkergroups", | ||
| namespace=namespace, | ||
| name=worker_group_name, | ||
| body={"spec": {"replicas": new}}, | ||
| ) |
There was a problem hiding this comment.
I'm excited about the kr8s changes in #696 because this will be hugely simplified.
| async with kubernetes.client.api_client.ApiClient() as api_client: | |
| custom_objects_api = kubernetes.client.CustomObjectsApi(api_client) | |
| custom_objects_api.api_client.set_default_header( | |
| "content-type", "application/merge-patch+json" | |
| ) | |
| await custom_objects_api.patch_namespaced_custom_object_scale( | |
| group="kubernetes.dask.org", | |
| version="v1", | |
| plural="daskworkergroups", | |
| namespace=namespace, | |
| name=worker_group_name, | |
| body={"spec": {"replicas": new}}, | |
| ) | |
| worker_group = await DaskWorkerGroup.get(worker_group_name) | |
| await worker_group.scale(new) |
Happy to merge this as-is though and I'll update it in a follow-up once kr8s is integrated nicely.
There was a problem hiding this comment.
Oh, wow! Definitely looking forward to that!
| k8s_cluster.kubectl( | ||
| "scale", | ||
| "--replicas=3", | ||
| "daskcluster.kubernetes.dask.org", | ||
| cluster_name, | ||
| ) | ||
| await client.wait_for_workers(3) |
There was a problem hiding this comment.
This wont actually test anything because client.wait_for_workers(3) will just pass when there are 5 workers.
I opened dask/distributed#6377 a while ago to allow you to modify the behaviour for cases like this. I should nudge that PR along.
Maybe we should have a TODO comment here so that we come back and fix it up later.
|
Looks like CI is failing. Looking into it. |
Closes #636. This PR adds another handler that scales the default worker group when the
worker.replicasfield of the cluster spec is changed.