Skip to content

Comments

Refactor build wheels pipeline to build wheels and run pytest#301

Open
alekstheod wants to merge 13 commits intomasterfrom
use_rbe_platform_from_xla
Open

Refactor build wheels pipeline to build wheels and run pytest#301
alekstheod wants to merge 13 commits intomasterfrom
use_rbe_platform_from_xla

Conversation

@alekstheod
Copy link
Contributor

@alekstheod alekstheod commented Feb 10, 2026

This PR adjust the build_wheel pipeline to build wheel and make an integration test with installed wheel into the venv + using pytest.
Note: I tried - Use rbe platform from inside xla project, unfortunately it can't work as we have different tags in jax and xla doesn't handle these tags. Tags are used in xla to assign rbe pool to each test action!

@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch from 536aa4a to 3adaf27 Compare February 10, 2026 16:17
build:rocm_rbe --host_platform="//platform/linux:tf_linux_gpu"
build:rocm_rbe --extra_execution_platforms="//platform/linux:tf_linux_gpu"
build:rocm_rbe --platforms="//platform/linux:tf_linux_gpu"
build:rocm_rbe --host_platform="@local_config_rocm//rocm:linux_x64"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove the jax_rocm_plugin/platform/linux/BUILD file if we're going to use the platform in rocm/xla. Also, pretty sure that this is going to break PR CI as it works now, unless your change to make the XLA platform's Docker image configurable landed in upstream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like firstly merge this PR: #297
So I can test every consequent one. I will adjust this PR once rebased.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately removing the old platform is not possible. So I will just convert this PR to adjusting the build_wheels pipeline instead!

@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch 3 times, most recently from f82499e to 554f0ba Compare February 12, 2026 17:40
@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch 23 times, most recently from 94ea954 to 0934ef9 Compare February 16, 2026 15:47
@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch 4 times, most recently from 5698800 to d0f00db Compare February 16, 2026 16:27
@alekstheod alekstheod changed the title Use rbe platform from xla Refactor build wheels pipeline to build wheels and run pytest Feb 16, 2026
@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch 3 times, most recently from 9f1a3d0 to 8425212 Compare February 16, 2026 17:26
cancel-in-progress: true

permissions:
contents: read
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting these in there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only became possible because of the adaptions: single workspace and on the fly wheel build.

rbe_ci_cert: ${{ secrets.RBE_CI_CERT }}
rbe_ci_key: ${{ secrets.RBE_CI_KEY }}
builder-image: "search"
call-build-docker:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to the end of the workflow? We don't want to lose the ability for PR CI to check if the docker image build is busted. It'll use the wheels that you build and upload to artifacts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on adding it back

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it but implemented using a bash script. Why?
image

The chain reduced, less code to manage, simpler.

rocm-version: ["7"]
python-version: ["3.11", "3.12", "3.13", "3.14"]
container:
image: rocm/tensorflow-build@sha256:7fcfbd36b7ac8f6b0805b37c4248e929e31cf5ee3af766c8409dd70d5ab65faa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've finally got the manylinux build to happen in its own stage. Could we use that here instead? Should be called ghcr.io/rocm/jax-manylinux_2_28-rocm-7.2.0:latest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets create a separate PR for that. I think last time I tried something from our images I got permission issue!

@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch 4 times, most recently from c79e09a to 271f006 Compare February 17, 2026 20:07
@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch from 271f006 to 1cfeb76 Compare February 18, 2026 06:59
@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch from 6d93aef to a836f55 Compare February 18, 2026 07:59
@alekstheod alekstheod force-pushed the use_rbe_platform_from_xla branch from f65c551 to f76fc54 Compare February 18, 2026 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants