chore(spark): migrate SDK to kubeflow_spark_api Pydantic models by tariq-hasan · Pull Request #295 · kubeflow/sdk

tariq-hasan · 2026-02-15T21:50:34Z

What this PR does / why we need it:

This PR migrates the Spark SDK from constructing CRDs using raw dictionaries to using the typed Pydantic models provided by kubeflow_spark_api. There are no user-facing API changes in this PR.

What changed:

Replace raw dict-based CRD construction with typed Pydantic models from kubeflow_spark_api
Convert to dict only at the Kubernetes API boundary
Parse API responses using .from_dict() instead of manual extraction
Keep user-facing dataclasses (Driver, Executor, SparkConnectInfo) unchanged

Why:

Improves type safety within the SDK
Aligns Spark with the established Trainer/Optimizer architecture

Testing:
Tested against kubeflow_spark_api==2.4.0rc0.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Fixes #271

Checklist:

Docs included if any changes are user facing

github-actions · 2026-02-15T21:50:43Z

🎉 Welcome to the Kubeflow SDK! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards
Our team will review your PR soon! cc @kubeflow/kubeflow-sdk-team

Join the community:

Slack: Join our #kubeflow-ml-experience and #kubeflow-trainer Slack channels
Meetings: Attend the Kubeflow SDK and ML Experience bi-weekly meetings

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

coveralls · 2026-02-15T21:53:51Z

Pull Request Test Coverage Report for Build 22043884336

Details

215 of 218 (98.62%) changed or added relevant lines in 6 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.06%) to 72.887%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
kubeflow/spark/backends/kubernetes/utils.py	48	51	94.12%

Totals
Change from base Build 21955571005:	0.06%
Covered Lines:	4105
Relevant Lines:	5632

💛 - Coveralls

Copilot

Pull request overview

This PR refactors the Spark SDK to use typed Pydantic models from kubeflow_spark_api instead of raw dictionaries for CRD construction. This aligns the Spark SDK with the established architecture pattern used by the Trainer SDK.

Changes:

Added kubeflow-spark-api>=2.3.0 dependency and migrated from dict-based CRD construction to typed Pydantic models
Updated all option implementations to work with Pydantic models instead of dictionaries
Refactored backend methods to convert to/from Pydantic models at the Kubernetes API boundary

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
pyproject.toml	Added kubeflow-spark-api>=2.3.0 dependency
uv.lock	Lock file updates for new kubeflow-spark-api dependency
kubeflow/spark/backends/kubernetes/backend.py	Convert between dict and Pydantic models at API boundary using .to_dict() and .from_dict()
kubeflow/spark/backends/kubernetes/utils.py	Refactored build_spark_connect_crd to return Pydantic model; renamed parse_spark_connect_status to get_spark_connect_info_from_cr with Pydantic input
kubeflow/spark/types/options.py	Updated all option callables to accept SparkConnect Pydantic model instead of dict
kubeflow/spark/backends/kubernetes/backend_test.py	Enhanced mock responses to include all required fields for Pydantic model validation
kubeflow/spark/backends/kubernetes/utils_test.py	Updated tests to work with Pydantic models and added validation test for invalid CR
kubeflow/spark/types/options_test.py	Migrated tests to use spark_connect_model fixture and verify Pydantic model attributes
hack/Dockerfile.spark-e2e-runner	Added --pre flag to allow installation of pre-release versions

Copilot · 2026-02-15T21:54:20Z

kubeflow/spark/types/options.py

+            role_spec.template = models.IoK8sApiCoreV1PodTemplateSpec()
+
+        # Convert existing template to dict, merge, and convert back
+        existing_dict = role_spec.template.to_dict() if role_spec.template else {}


Redundant None check in ternary expression. Since role_spec.template is guaranteed to be non-None after line 193, the ternary expression role_spec.template.to_dict() if role_spec.template else {} can be simplified to role_spec.template.to_dict().

Suggested change

existing_dict = role_spec.template.to_dict() if role_spec.template else {}

existing_dict = role_spec.template.to_dict()

andreyvelich · 2026-02-23T17:13:04Z

/assign @Shekharrajak
Please can you help with the review?

andreyvelich

Thanks @tariq-hasan!
Overall, looks great.
cc @kubeflow/kubeflow-sdk-team @Fiona-Waters @abhijeet-dhumal @jaiakash

andreyvelich · 2026-02-25T01:37:13Z

hack/Dockerfile.spark-e2e-runner

 WORKDIR /app

 COPY pyproject.toml README.md LICENSE ./
 COPY kubeflow/ kubeflow/


Can we remove these filters and run Spark E2E tests for every PR?
https://github.com/tariq-hasan/sdk/blob/255b2ad2e3f953b3aa78deebd4b20a137eb0667c/.github/workflows/test-spark-examples.yaml#L4-L12

It should be fine to run them on every PR, like we do for other tests.
For example, we didn't trigger Spark tests when PySpark dependency is updated: #300

Sounds good. I have removed the paths.

andreyvelich · 2026-02-25T01:49:59Z

kubeflow/spark/backends/kubernetes/backend_test.py

+    base_response = {
+        "apiVersion": f"{constants.SPARK_CONNECT_GROUP}/{constants.SPARK_CONNECT_VERSION}",
+        "kind": constants.SPARK_CONNECT_KIND,
+        "spec": {
+            "sparkVersion": constants.DEFAULT_SPARK_VERSION,
+            "image": constants.DEFAULT_SPARK_IMAGE,
+            "server": {
+                "cores": constants.DEFAULT_DRIVER_CPU,
+                "memory": constants.DEFAULT_DRIVER_MEMORY,
+            },
+            "executor": {
+                "instances": 2,
+                "cores": constants.DEFAULT_EXECUTOR_CPU,
+                "memory": constants.DEFAULT_EXECUTOR_MEMORY,
+            },
+        },
+    }


Can you refactor it to use Spark Models, like we do in Trainer: https://github.com/tariq-hasan/sdk/blob/255b2ad2e3f953b3aa78deebd4b20a137eb0667c/kubeflow/trainer/backends/kubernetes/backend_test.py#L312 ?

I have introduced the get_spark_connect function to return a typed model in place of the dict-based approach.

andreyvelich · 2026-02-25T01:50:11Z

kubeflow/spark/backends/kubernetes/backend_test.py

+    """
+    base_spec = {
+        "sparkVersion": constants.DEFAULT_SPARK_VERSION,
+        "image": constants.DEFAULT_SPARK_IMAGE,
+        "server": {
+            "cores": constants.DEFAULT_DRIVER_CPU,
+            "memory": constants.DEFAULT_DRIVER_MEMORY,
+        },
+        "executor": {
+            "instances": 2,
+            "cores": constants.DEFAULT_EXECUTOR_CPU,
+            "memory": constants.DEFAULT_EXECUTOR_MEMORY,
+        },


Same suggestion.

The get_spark_connect function is used here now as well.

andreyvelich · 2026-02-25T01:55:43Z

kubeflow/spark/backends/kubernetes/utils.py

+        API ExecutorSpec model.
+    """
+    # Determine number of instances
+    if executor and executor.num_instances is not None:


Suggested change

if executor and executor.num_instances is not None:

if executor and executor.num_instances:

I have updated the code to remove is not None.

andreyvelich · 2026-02-25T01:55:52Z

kubeflow/spark/backends/kubernetes/utils.py

+    # Determine number of instances
+    if executor and executor.num_instances is not None:
+        instances = executor.num_instances
+    elif num_executors is not None:


Suggested change

elif num_executors is not None:

elif num_executors:

I have updated the code here as well to remove is not None.

andreyvelich · 2026-02-25T01:57:49Z

kubeflow/spark/backends/kubernetes/utils.py

@@ -99,8 +184,8 @@ def build_spark_connect_crd(
    executor: Optional[Executor] = None,


I still think we should remove driver and executor spec from the first released version, and extend it later.
@Shekharrajak Do you have any particular use-case when users want to set it?

Suggested change

executor: Optional[Executor] = None,

andreyvelich · 2026-02-25T01:59:24Z

kubeflow/spark/backends/kubernetes/utils.py

-        name=metadata.get("name", ""),
-        namespace=metadata.get("namespace", ""),
+        name=spark_connect_cr.metadata.name,
+        namespace=spark_connect_cr.metadata.namespace or "",


namespace cannot be none

Suggested change

namespace=spark_connect_cr.metadata.namespace or "",

namespace=spark_connect_cr.metadata.namespace,

I have made the change.

andreyvelich · 2026-02-25T02:01:08Z

kubeflow/spark/backends/kubernetes/utils_test.py

+        crd = spark_connect.to_dict()
        assert crd["spec"]["executor"]["cores"] == 2
        assert crd["spec"]["executor"]["memory"] == "4g"


Can you refactor these tests to just access the object fields, you don't need to run to_dict(), e.g.

assert crd.spec.executor.cores == 2

Sounds good. I have made the change to use the spark_connect typed model directly and removed the to_dict() conversion.

andreyvelich · 2026-02-25T02:02:45Z

pyproject.toml

  "pydantic>=2.10.0",
  "kubeflow-trainer-api>=2.0.0",
  "kubeflow-katib-api>=0.19.0",
+  "kubeflow-spark-api>=2.3.0",


Can you add this to the spark extras, alongside pyspark.

I have added it there as well.

@tariq-hasan I think, we can remove it from the main deps for now.

Suggested change

"kubeflow-spark-api>=2.3.0",

tariq-hasan · 2026-02-26T12:38:56Z

I have rebased the PR as well to account for the changes coming from #288.

astefanutti · 2026-02-26T13:24:49Z

kubeflow/spark/backends/kubernetes/utils.py

+    )
+
+
 def build_spark_connect_crd(


Suggested change

def build_spark_connect_crd(

def build_spark_connect_cr(

I have replaced all the CRD references with CR references.

astefanutti · 2026-02-26T13:25:10Z

kubeflow/spark/backends/kubernetes/utils.py

-) -> dict[str, Any]:
-    """Build SparkConnect CRD manifest (KEP-107 compliant).
+) -> models.SparkV1alpha1SparkConnect:
+    """Build SparkConnect CRD using typed API models (KEP-107 compliant).


Suggested change

"""Build SparkConnect CRD using typed API models (KEP-107 compliant).

"""Build SparkConnect CR using typed API models (KEP-107 compliant).

astefanutti · 2026-02-26T13:25:18Z

kubeflow/spark/backends/kubernetes/utils.py


    Returns:
-        SparkConnect CRD as dictionary.
+        SparkConnect CRD as typed Pydantic model.


Suggested change

SparkConnect CRD as typed Pydantic model.

SparkConnect CR as typed Pydantic model.

Done as well.

andreyvelich

@tariq-hasan Do you want me to release the 2.4.0 version to PyPI?

andreyvelich · 2026-02-26T18:31:55Z

pyproject.toml

  "pydantic>=2.10.0",
  "kubeflow-trainer-api>=2.0.0",
  "kubeflow-katib-api>=0.19.0",
+  "kubeflow-spark-api>=2.3.0",


@tariq-hasan I think, we can remove it from the main deps for now.

Suggested change

"kubeflow-spark-api>=2.3.0",

tariq-hasan · 2026-02-27T01:34:46Z

@tariq-hasan Do you want me to release the 2.4.0 version to PyPI?

@andreyvelich Sounds good with me.

andreyvelich · 2026-02-27T02:20:42Z

@tariq-hasan I've published 2.4.0 version to PyPI, you can update pyproject.toml and uv.lock
cc @ChenYi015 @Shekharrajak @vara-bonthu @nabuskey
https://pypi.org/project/kubeflow-spark-api/2.4.0/

tariq-hasan · 2026-02-27T03:09:52Z

pyproject.toml

  "coverage>=7.0",
  "kubeflow_trainer_api@git+https://github.com/kubeflow/trainer.git@master#subdirectory=api/python_api",
  "kubeflow_katib_api@git+https://github.com/kubeflow/katib.git@master#subdirectory=api/python_api",
+  "kubeflow_spark_api@git+https://github.com/kubeflow/spark-operator.git@master#subdirectory=api/python_api",


@andreyvelich I have noticed that the version defined in the kubeflow-spark-api package is 2.3.0: https://github.com/kubeflow/spark-operator/blob/master/api/python_api/kubeflow_spark_api/__init__.py#L16.

Should I raise a PR to update the version in https://github.com/kubeflow/spark-operator/blob/master/api/python_api/kubeflow_spark_api/__init__.py?

Yes, we should update this to v2.4.0, and run make generate. I did it in my local branch.
https://github.com/kubeflow/spark-operator/blob/master/VERSION#L1

I have raised the PR: kubeflow/spark-operator#2853.

After it is merged I can update pyproject.toml and uv.lock in this PR to ensure sync with the 2.4.0 package.

astefanutti

Thanks @tariq-hasan!

/lgtm

andreyvelich · 2026-02-27T12:37:20Z

/hold for this: kubeflow/spark-operator#2853

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

andreyvelich · 2026-02-27T18:38:58Z

@tariq-hasan Since this is merged, please rebase this PR, so we can move forward: kubeflow/spark-operator#2853

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

tariq-hasan · 2026-02-27T21:25:24Z

@tariq-hasan Since this is merged, please rebase this PR, so we can move forward: kubeflow/spark-operator#2853

@andreyvelich I have rebased the PR and updated the package version for kubeflow-spark-api.

andreyvelich

Thanks for the updates @tariq-hasan!
/lgtm
/approve

google-oss-prow · 2026-02-28T00:32:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

andreyvelich · 2026-02-28T00:43:40Z

/hold cancel

andreyvelich · 2026-02-28T01:48:11Z

Will merge this manually.
@Fiona-Waters It looks like our Validate Lockfile Security is blocking the merge.

This job was skipped

Copilot AI review requested due to automatic review settings February 15, 2026 21:50

google-oss-prow bot requested review from Electronic-Waste, kramaranya and szaher February 15, 2026 21:50

google-oss-prow bot added the size/XL label Feb 15, 2026

Copilot started reviewing on behalf of tariq-hasan February 15, 2026 21:50 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

tariq-hasan changed the title ~~refactor(spark): migrate SDK to kubeflow_spark_api Pydantic models~~ chore(spark): migrate SDK to kubeflow_spark_api Pydantic models Feb 15, 2026

tariq-hasan force-pushed the refactor/spark-pypi-models branch from 2df7db9 to 255b2ad Compare February 15, 2026 22:03

google-oss-prow bot assigned Shekharrajak Feb 23, 2026

andreyvelich reviewed Feb 25, 2026

View reviewed changes

NarayanaSabari mentioned this pull request Feb 25, 2026

Integrating the Kubeflow Spark Application with the Kubeflow SDK #107

Open

tariq-hasan force-pushed the refactor/spark-pypi-models branch from 255b2ad to 740fbf4 Compare February 26, 2026 12:25

astefanutti reviewed Feb 26, 2026

View reviewed changes

andreyvelich reviewed Feb 27, 2026

View reviewed changes

tariq-hasan commented Feb 27, 2026

View reviewed changes

tariq-hasan mentioned this pull request Feb 27, 2026

chore(api): bump Python API version to 2.4.0 kubeflow/spark-operator#2853

Merged

4 tasks

astefanutti reviewed Feb 27, 2026

View reviewed changes

google-oss-prow bot assigned astefanutti Feb 27, 2026

google-oss-prow bot added the lgtm label Feb 27, 2026

google-oss-prow bot added the do-not-merge/hold label Feb 27, 2026

tariq-hasan added 2 commits February 27, 2026 13:00

chore(spark): add kubeflow-spark-api dependency

0538593

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

chore(spark): migrate options to typed Pydantic models

88d2213

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

tariq-hasan added 4 commits February 27, 2026 13:00

chore(spark): migrate utils to typed Pydantic models

294e2d5

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

chore(spark): migrate backend to typed Pydantic models

3545d95

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

chore(spark): refactor tests to use typed models and cleanup

662223b

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

chore(spark): rename build_spark_connect_crd to build_spark_connect_cr

21991a3

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

tariq-hasan force-pushed the refactor/spark-pypi-models branch from 876488d to 21991a3 Compare February 27, 2026 20:29

google-oss-prow bot removed the lgtm label Feb 27, 2026

fix(spark): use typed model helpers in mock handlers

2c6201a

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

tariq-hasan force-pushed the refactor/spark-pypi-models branch from 7919126 to 2c6201a Compare February 27, 2026 20:51

google-oss-prow bot added size/XXL and removed size/XL labels Feb 27, 2026

chore(spark): bump kubeflow-spark-api to 2.4.0

e562676

Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>

tariq-hasan force-pushed the refactor/spark-pypi-models branch from cea64e5 to e562676 Compare February 27, 2026 21:10

andreyvelich reviewed Feb 28, 2026

View reviewed changes

google-oss-prow bot assigned andreyvelich Feb 28, 2026

google-oss-prow bot added the lgtm label Feb 28, 2026

google-oss-prow bot added the approved label Feb 28, 2026

google-oss-prow bot removed the do-not-merge/hold label Feb 28, 2026

andreyvelich merged commit 43b9590 into kubeflow:main Feb 28, 2026
18 of 20 checks passed

google-oss-prow bot added this to the v0.4 milestone Feb 28, 2026

	existing_dict = role_spec.template.to_dict() if role_spec.template else {}
	existing_dict = role_spec.template.to_dict()

	if executor and executor.num_instances is not None:
	if executor and executor.num_instances:

		@@ -99,8 +184,8 @@ def build_spark_connect_crd(
		executor: Optional[Executor] = None,

	namespace=spark_connect_cr.metadata.namespace or "",
	namespace=spark_connect_cr.metadata.namespace,

	"""Build SparkConnect CRD using typed API models (KEP-107 compliant).
	"""Build SparkConnect CR using typed API models (KEP-107 compliant).

	SparkConnect CRD as typed Pydantic model.
	SparkConnect CR as typed Pydantic model.

Conversation

tariq-hasan commented Feb 15, 2026

Uh oh!

github-actions bot commented Feb 15, 2026

Uh oh!

coveralls commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 22043884336

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

andreyvelich commented Feb 23, 2026

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tariq-hasan Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tariq-hasan Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tariq-hasan commented Feb 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

coveralls commented Feb 15, 2026 •

edited

Loading

tariq-hasan Feb 26, 2026 •

edited

Loading

tariq-hasan Feb 26, 2026 •

edited

Loading

tariq-hasan commented Feb 27, 2026 •

edited

Loading