KEP-5007: DRA Device Binding Conditions beta in 1.36#5846
KEP-5007: DRA Device Binding Conditions beta in 1.36#5846k8s-ci-robot merged 1 commit intokubernetes:masterfrom
Conversation
|
Hi @ttsuuubasa. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
kannon92
left a comment
There was a problem hiding this comment.
Please make sure to request a PRR review for the beta promotion.
|
/ok-to-test |
cbc56b6 to
55e918d
Compare
|
@kannon92 |
| - Pods which are not bound yet (in api-server) and not unschedulable (in api-server) are not visible by cluster autoscaler, so there is a risk that the node will be turned down | ||
| - Additional tests are in Testgrid and linked in KEP | ||
| - Scheduler supports timeout configuration via command-line argument | ||
| - In this use case, the attachment scenario for moving devices between different pools is achieved through re-scheduling triggered by BindingFailureConditions. However, there remains an issue that device migration needs to be implemented using BindingConditions as a happy‑path flow. This will be addressed in a separate KEP and will be considered out of scope for the beta-graduation criteria. |
There was a problem hiding this comment.
+1 for decoupling Binding Conditions from the attachment, but I'm a bit skeptical whether the problem can be fixed easily (see discussion in kubernetes/kubernetes#135473 (comment)), so the question is whether the happy-path is really good enough and proved working?
@wojtek-t @sanposhiho @macsko WDYT?
There was a problem hiding this comment.
Personally, I consider having at least one user of the happy path at the prototype stage (= PR fully implemented and reviewed, but maybe not merged because of release timing) sufficient for beta. But we should have one.
There was a problem hiding this comment.
With "happy path" I meant the one we have right now, i.e. without update the allocation.
There was a problem hiding this comment.
The happy path is planned to be implemented in NVIDIA's ComputeDomain case.
My team members are currently working on the implementation. We plan to submit a pull request to NVIDIA's DRA GitHub within the next two or three days.
There was a problem hiding this comment.
I am on vacation through the end of this week. If you have a PR ready this week I will make a point to review it on Monday, Tuesday, and Wednesday before the feature freeze. Please be ready to respond to review comments daily so we can get it in good shape by your EOD on Wednesday.
There was a problem hiding this comment.
Thank you very much!
All members of my team (implementation team) will ensure they can address any review comments you provide promptly.
There was a problem hiding this comment.
Ack. Will start looking on Monday.
|
@johnbelamaric |
dom4ha
left a comment
There was a problem hiding this comment.
Looks good to me, waiting for @johnbelamaric before I give approve
| - "@macsko" | ||
| - "@sanposhiho" | ||
| approvers: | ||
| - "@alculquicondor" |
There was a problem hiding this comment.
@dom4ha
Thank you for the review.
I’ve added you as an approver and pushed the latest changes.
johnbelamaric
left a comment
There was a problem hiding this comment.
/lgtm
Also approved for PRR. I will add the Prow command after SIG approval.
- Updated the Production Readiness Review questionnaire and introduced metrics for troubleshooting and operations. - Addressed review comments from the v1.35 PR kubernetes#5487. - Added Graduation Criteria for beta. - Clarify that happy-path device migration is out of scope for beta criteria Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
johnbelamaric
left a comment
There was a problem hiding this comment.
/lgtm
Also approved for PRR. I will add the Prow command after SIG approval.
- Updated the Production Readiness Review questionnaire and introduced metrics for troubleshooting and operations. - Addressed review comments from the v1.35 PR kubernetes#5487. - Added Graduation Criteria for beta. - Clarify that happy-path device migration is out of scope for beta criteria Signed-off-by: Tsubasa Watanabe <w.tsubasa@fujitsu.com>
55e918d to
a89021d
Compare
|
@johnbelamaric |
|
/approve |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dom4ha, johnbelamaric, ttsuuubasa The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@johnbelamaric @dom4ha |
Other comments:
This PR promotes the DRA Device Binding Conditions feature from alpha to beta for Kubernetes v1.36, with enhancements based on DRA driver developer's feedback and metrics.
1. Stage Promotion to Beta
stage: alpha→stage: betain kep.yamllatest-milestone: "v1.35"→"v1.36"2. Enhanced DRA Driver Developer's Feedback
3. Improved Monitoring & Observability
scheduler_dra_bindingconditions_allocations_total:tracks scheduling attempts with success/failure/timeout status
scheduler_dra_bindingconditions_prebind_duration_seconds:measures PreBind phase duration with detailed labels
4. Clarified Feature Scope
NOTE:
I addressed comments and suggestions from @johnbelamaric during the v1.35 review cycle:
KEP-5007: DRA Device Binding Conditions alpha in 1.35 #5487
/wg device-management
/sig scheduling
/cc @pohly @johnbelamaric @dom4ha