Skip to content

Make pods wait for user code containers to be actual ready#491

Draft
simeoncarstens wants to merge 10 commits intomainfrom
user-code-startup-probe
Draft

Make pods wait for user code containers to be actual ready#491
simeoncarstens wants to merge 10 commits intomainfrom
user-code-startup-probe

Conversation

@simeoncarstens
Copy link
Member

This is an attempt (and WIP) to solve #386. The current strategy is to implement a readiness or startup probe (currently, startup, but probably readiness is the appropriate one) to make sure the user code containers are ready, meaning the gRPC services for log-prob / gradient are ready to respond in a timely manner.
Once the probe succeeds, that container is deemed ready / started, and the pod can be considered ready.

One pitfall could be that possibly the controller pod is also running a user code container that is actually used in the calculation. So we want the controller container to only start sending out sampling requests until not only once all user code containers in other pods are ready, but also the user code container in the controller pod has to be ready.

I'm not sure whether a startup probe is enough, or whether we need an init container on the controller pod that makes all controller pod containers start only once all other pods are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant