Skip to content

Comments

[WIP] Remove patchelf usage#299

Open
alekstheod wants to merge 2 commits intomasterfrom
remove_patchelf_usage
Open

[WIP] Remove patchelf usage#299
alekstheod wants to merge 2 commits intomasterfrom
remove_patchelf_usage

Conversation

@alekstheod
Copy link
Contributor

@alekstheod alekstheod commented Feb 10, 2026

Remove usage patchelf and use linkopts instead to set the rpaths.
This PR is a collection of changes that are required to be done in order to avoid using
external tools to control the rpaths on a specific target while building xla/jax.

There is upstreamed PR for xla that adds the basic support: openxla/xla#37513

If we want to use this approach we can
1 - land the changes to our xla release branch
2 - depending on how we want to go further
* set rpaths directly in jax targets (upstream jax/release branches)
* or move jax targets into rocm-jax and set the linkopts there!

@alekstheod alekstheod force-pushed the remove_patchelf_usage branch from 3b59ce2 to 383650f Compare February 10, 2026 12:13
@alekstheod alekstheod force-pushed the remove_patchelf_usage branch from 383650f to e6ee19c Compare February 10, 2026 12:18
@Ruturaj4
Copy link
Contributor

why do we need the xla patch? can't we create a pr in xla branch to fix?

Copy link
Collaborator

@charleshofer charleshofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what the motivation is behind this PR. The description is kind of sparse. Can you explain the motivation for the PR and briefly explain how it works in the description?

This change does remove the rpath hack that we have in the Python scripts that do the wheel builds, but from what I can tell, the cost we're paying is copying a bunch of upstream Bazel and C++ code, adding another Bazel file specifically for handling rpaths, and adding yet another config option. Copying in rocm_plugin_extension.cc is especially tricky because upstream does modify that one a fair bit. Maybe there's reasons for bring BUILD and rocm_plugin_extension.cc into this repo and removing them from upstream? But right now, I'm inclined to think the cure is worth than the disease in this case.

Also, echoing Ruturaj, that XLA patch should go in the rocm/xla repo. Adding our own changes is the reason for having a branch there.

@@ -0,0 +1,125 @@
diff --git a/third_party/gpus/crosstool/hipcc_cc_toolchain_config.bzl.tpl b/third_party/gpus/crosstool/hipcc_cc_toolchain_config.bzl.tpl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically https://github.com/openxla/xla/pull/37513/changes, right? Can you land that in our rocm/xla branch instead of creating patch files here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do but we need to know if we want to follow this path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weird thing is we have some targets defined inside the rocm-jax but some other inside jax... We need to decide where we keep our code. If we like to follow the plugin approach we need to move the targets and code inside rocm-jax. If not then I would need to implement changes directly in jax.


def _wheel_linkopts():
return select({
_ROCM_LINK_ONLY: _WHEEL_RPATHS,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these need to be conditional? Don't we always need the -rpath options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. For tests we don't need rpath, tests shall be looking inside the sandbox


package(default_visibility = ["//visibility:public"])

rocm_nanobind_extension(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we intend to completely remove this file from upstream, this file is going to be a pain to maintain, as we'll be occasionally porting upstream changes into the plugin repo. Do you intend to remove the upstream one?

It looks like all this is doing is using rocm_nanobind_extension instead of nanobind_extension so that we can pick up the rpath configuration. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there is no difference except that rocm_nanobind_extension handles the rpaths. The question is if we want to implement it upstream (jax repo) or not....


if "rocm" in args.wheels:
wheel_build_command_base.append("--config=rocm_base")
wheel_build_command_base.append("--config=rocm_wheel")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This config just turns the -rpath settings on and off. Could we call this config something like no_rocm_rpath?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think release_wheel would be more appropriate here. no_rocm_rpath seems to be very specific. It is in fact replace rpaths with wheel based rpaths.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in fact replace rpaths with wheel based rpaths.

Which wheels are you referring to here? The JAX plugin wheels? Or the ROCm whels?

@@ -0,0 +1,126 @@
/* Copyright 2024 The JAX Authors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the BUILD file, it's going to be a pain to maintain this file unless we simply remove it from upstream. Is that the intent?

Also, can you explain what's different between this one and the upstream one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no difference. Here is a big question. Do we want to move code from jax to rocm-jax or not. If not all the changes have to be done inside the jax repo. These changes will be minimal then!

@alekstheod
Copy link
Contributor Author

why do we need the xla patch? can't we create a pr in xla branch to fix?

Yes I did but for time being it is implemented as a patch to show the working version.

@alekstheod alekstheod changed the title WIP Remove patchelf usage [WIP] Remove patchelf usage Feb 12, 2026
@alekstheod
Copy link
Contributor Author

why do we need the xla patch? can't we create a pr in xla branch to fix?

Will do. I first upstreamed it.

@alekstheod
Copy link
Contributor Author

why do we need the xla patch? can't we create a pr in xla branch to fix?

ROCm/xla#626

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants