Skip to content

fix(trainer): fallback to cluster-scoped runtime on 404 in get_runtime#341

Open
prabindersinghh wants to merge 1 commit intokubeflow:mainfrom
prabindersinghh:fix-cluster-runtime-fallback
Open

fix(trainer): fallback to cluster-scoped runtime on 404 in get_runtime#341
prabindersinghh wants to merge 1 commit intokubeflow:mainfrom
prabindersinghh:fix-cluster-runtime-fallback

Conversation

@prabindersinghh
Copy link

What this PR does

This PR fixes the get_runtime() fallback behavior in the Kubernetes backend.

Previously, when a namespaced TrainingRuntime returned a 404 error,
get_runtime() could incorrectly raise an error instead of properly
falling back to the cluster-scoped ClusterTrainingRuntime.

This change:

  • Allows fallback to cluster-scoped runtime only when the namespaced call returns 404
  • Preserves existing error handling for other API exceptions
  • Adds a unit test to validate the fallback behavior
  • Ensures all existing backend tests pass

Fixes

Fixes #335

Checklist

  • Unit tests added
  • Existing tests pass
  • No breaking changes

Copilot AI review requested due to automatic review settings February 28, 2026 11:22
@github-actions
Copy link
Contributor

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
  • Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

@google-oss-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kramaranya for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…t test

Signed-off-by: Prabinder Singh <prabindersinghh@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Kubernetes backend get_runtime() so that a 404 from the namespaced TrainingRuntime lookup properly falls back to the cluster-scoped ClusterTrainingRuntime, aligning behavior with issue #335.

Changes:

  • Update get_runtime() to fall back to cluster-scoped runtime only on namespaced 404s.
  • Add a unit test to validate the 404 fallback behavior.
  • Apply various formatting-only adjustments (line wrapping) in the Kubernetes backend module.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
kubeflow/trainer/backends/kubernetes/backend.py Adjusts get_runtime() exception handling to allow cluster fallback on namespaced 404.
kubeflow/trainer/backends/kubernetes/backend_test.py Adds a unit test intended to assert the 404 fallback behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ClusterRuntime un-getable

2 participants