Conversation
3b59ce2 to
383650f
Compare
383650f to
e6ee19c
Compare
|
why do we need the xla patch? can't we create a pr in xla branch to fix? |
There was a problem hiding this comment.
I'm not really sure what the motivation is behind this PR. The description is kind of sparse. Can you explain the motivation for the PR and briefly explain how it works in the description?
This change does remove the rpath hack that we have in the Python scripts that do the wheel builds, but from what I can tell, the cost we're paying is copying a bunch of upstream Bazel and C++ code, adding another Bazel file specifically for handling rpaths, and adding yet another config option. Copying in rocm_plugin_extension.cc is especially tricky because upstream does modify that one a fair bit. Maybe there's reasons for bring BUILD and rocm_plugin_extension.cc into this repo and removing them from upstream? But right now, I'm inclined to think the cure is worth than the disease in this case.
Also, echoing Ruturaj, that XLA patch should go in the rocm/xla repo. Adding our own changes is the reason for having a branch there.
| @@ -0,0 +1,125 @@ | |||
| diff --git a/third_party/gpus/crosstool/hipcc_cc_toolchain_config.bzl.tpl b/third_party/gpus/crosstool/hipcc_cc_toolchain_config.bzl.tpl | |||
There was a problem hiding this comment.
This is basically https://github.com/openxla/xla/pull/37513/changes, right? Can you land that in our rocm/xla branch instead of creating patch files here?
There was a problem hiding this comment.
Will do but we need to know if we want to follow this path.
There was a problem hiding this comment.
The weird thing is we have some targets defined inside the rocm-jax but some other inside jax... We need to decide where we keep our code. If we like to follow the plugin approach we need to move the targets and code inside rocm-jax. If not then I would need to implement changes directly in jax.
|
|
||
| def _wheel_linkopts(): | ||
| return select({ | ||
| _ROCM_LINK_ONLY: _WHEEL_RPATHS, |
There was a problem hiding this comment.
Why do these need to be conditional? Don't we always need the -rpath options?
There was a problem hiding this comment.
No. For tests we don't need rpath, tests shall be looking inside the sandbox
|
|
||
| package(default_visibility = ["//visibility:public"]) | ||
|
|
||
| rocm_nanobind_extension( |
There was a problem hiding this comment.
Unless we intend to completely remove this file from upstream, this file is going to be a pain to maintain, as we'll be occasionally porting upstream changes into the plugin repo. Do you intend to remove the upstream one?
It looks like all this is doing is using rocm_nanobind_extension instead of nanobind_extension so that we can pick up the rpath configuration. Is that correct?
There was a problem hiding this comment.
Yes there is no difference except that rocm_nanobind_extension handles the rpaths. The question is if we want to implement it upstream (jax repo) or not....
|
|
||
| if "rocm" in args.wheels: | ||
| wheel_build_command_base.append("--config=rocm_base") | ||
| wheel_build_command_base.append("--config=rocm_wheel") |
There was a problem hiding this comment.
nit: This config just turns the -rpath settings on and off. Could we call this config something like no_rocm_rpath?
There was a problem hiding this comment.
I think release_wheel would be more appropriate here. no_rocm_rpath seems to be very specific. It is in fact replace rpaths with wheel based rpaths.
There was a problem hiding this comment.
It is in fact replace rpaths with wheel based rpaths.
Which wheels are you referring to here? The JAX plugin wheels? Or the ROCm whels?
| @@ -0,0 +1,126 @@ | |||
| /* Copyright 2024 The JAX Authors. | |||
There was a problem hiding this comment.
Similar to the BUILD file, it's going to be a pain to maintain this file unless we simply remove it from upstream. Is that the intent?
Also, can you explain what's different between this one and the upstream one?
There was a problem hiding this comment.
There is no difference. Here is a big question. Do we want to move code from jax to rocm-jax or not. If not all the changes have to be done inside the jax repo. These changes will be minimal then!
Yes I did but for time being it is implemented as a patch to show the working version. |
Will do. I first upstreamed it. |
|
Remove usage patchelf and use linkopts instead to set the rpaths.
This PR is a collection of changes that are required to be done in order to avoid using
external tools to control the rpaths on a specific target while building xla/jax.
There is upstreamed PR for xla that adds the basic support: openxla/xla#37513
If we want to use this approach we can
1 - land the changes to our xla release branch
2 - depending on how we want to go further
* set rpaths directly in jax targets (upstream jax/release branches)
* or move jax targets into rocm-jax and set the linkopts there!