Rocm jaxlib v0.9.0 by Ruturaj4 · Pull Request #671 · ROCm/jax

Ruturaj4 · 2026-01-28T23:56:42Z

### Problem

Pallas tests fail on AMD ROCm GPUs with the error:

ValueError: invalid literal for int() with base 10: 'gfx950'

This occurs because:

The Mosaic GPU backend (NVIDIA-specific) attempts to parse the GPU architecture string
NVIDIA GPUs return compute_capability as "major.minor" (e.g., "9.0")
AMD ROCm GPUs return architecture identifiers like "gfx950"
The code tries to parse "gfx950" as an integer, causing the failure

Solution

This PR fixes the issue with three changes:

jax/_src/pallas/pallas_call.py: Route ROCm devices to Triton backend
- Added is_rocm check in gpu_lowering()
- When backend=None and running on ROCm, automatically use Triton instead of Mosaic GPU
- Added clear error if user explicitly requests backend='mosaic_gpu' on ROCm
jax/experimental/mosaic/gpu/core.py: Add safety check in _infer_arch()
- Detect ROCm architecture strings (starting with "gfx") and raise a descriptive error
tests/pallas/ops_test.py: Fix test_delay skip logic
- The delay primitive is only implemented in Mosaic GPU, not Triton
- Updated skip condition to include ROCm devices: jtu.is_device_rocm() or not use_mosaic_gpu

Testing

tests now pass (previously failing with gfx950 error)
test_delay properly skips on ROCm since delay is MGPU-only

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

An upcoming MLIR change is going to remove .isinstance. This is necessary to keep the code functioning. Fixes a couple more cases that I did not find earlier and found with integration testing. PiperOrigin-RevId: 853383203

PiperOrigin-RevId: 853413470

PiperOrigin-RevId: 853419648

An upcoming MLIR change is going to remove .isinstance. This is necessary to keep the code functioning. Fixes a couple more cases that I did not find earlier and found with integration testing. PiperOrigin-RevId: 853420916

Co-authored-by: Yash Katariya <yashkatariya@google.com>

Co-authored-by: Parker Schuh <parkers@google.com> Co-authored-by: Yash Katariya <yashkatariya@google.com> PiperOrigin-RevId: 853442821

PiperOrigin-RevId: 853444130

PiperOrigin-RevId: 853448754

Co-authored-by: Matthew Johnson <mattjj@google.com> PiperOrigin-RevId: 853464510

…ommit/76db112da7c2c66afeb550fc1089e6bec297bd4d PiperOrigin-RevId: 853591014

These tests pass just fine, now. PiperOrigin-RevId: 853694590

…ut inference. Use this `bitwidth` field to reject more relayouts that would fail in lowering. Previously to this change, we would sometimes erroneously choose such combinations of layouts. A concrete example is the following: ``` x: vector<?xi4> x_cast = layout_cast(x, WGMMA_LAYOUT_UPCAST_4X) y = convert(x): vector<?xbf16> y_cast = layout_cast(x, WGMMA_LAYOUT) ``` Clearly, the `layout_cast`s here force us to pick a point at which to relayout the `vector` from the `WGMMA_LAYOUT_UPCAST_4X` layout to the `WGMMA_LAYOUT`. Without filtering for bitwidth, there are two choices. We can either relayout before the `convert`, or after. However, we do not support this relayout for `16`-bit values---and choosing to relayout after the `convert` will therefore fail in lowering. PiperOrigin-RevId: 853729640

The tiling is global and applies to all refs passed to the pipeline function. This change is necessary to support using `pltpu.emit_pipeline` on SC where the tiling can either be (8, 128) or (8,). PiperOrigin-RevId: 853747075

PiperOrigin-RevId: 853768795

The thread guard prohibits execution of multi-process JAX operations on threads other than the owning one. This helps detect when McJAX operations are launched in different orders on different hosts, leading to intermittent crashes. PiperOrigin-RevId: 853785440

PiperOrigin-RevId: 853801923

PiperOrigin-RevId: 853806284

…`//tests/pallas:tpu_tests`. PiperOrigin-RevId: 853827799

Co-authored-by: Roy Frostig <frostig@google.com>

See https://github.com/jax-ml/jax/actions/runs/20830797751/job/59844186018. There were some new deps added in the recent MLIR update that were not propagated through. PiperOrigin-RevId: 853864868

Co-authored-by: Yash Katariya <yashkatariya@google.com>

…refs PiperOrigin-RevId: 853931279

…ckend. `del wrapper.object` ensures that the colocated python code at the backend does not have any remaining references on the object, irrespective of whether the backend code has any (accidental) references left over for the wrapper itself. PiperOrigin-RevId: 853946673

PiperOrigin-RevId: 853966274

PiperOrigin-RevId: 853977644

…ommit/9ae3d6dab2c10c8195c8d9862f475904c7cdca91 PiperOrigin-RevId: 854059415

(cherry picked from commit 02cf073)

) (cherry picked from commit abbb2ef)

…ts from upstream. This uses the following commits: - 6162538 - 9344b2b

…v0.8.2 (#658)

…#668)

)

#702)

boomanaiden154 and others added 30 commits January 7, 2026 13:10

Prefer isinstance(x, type) over type.isinstance

5006973

An upcoming MLIR change is going to remove .isinstance. This is necessary to keep the code functioning. Fixes a couple more cases that I did not find earlier and found with integration testing. PiperOrigin-RevId: 853383203

Merge pull request jax-ml#34204 from jakevdp:doc-sidebar

54cf88d

PiperOrigin-RevId: 853413470

Merge pull request jax-ml#32268 from samanklesaria:issues/32267

227cb22

PiperOrigin-RevId: 853419648

Prefer isinstance(x, type) over type.isinstance

27b3bff

An upcoming MLIR change is going to remove .isinstance. This is necessary to keep the code functioning. Fixes a couple more cases that I did not find earlier and found with integration testing. PiperOrigin-RevId: 853420916

Add jax.experimental.random to the wheel build

2486a5e

fix shard_map transpose explicit sharding zero unsharding bug

bf11cd8

Co-authored-by: Yash Katariya <yashkatariya@google.com>

add optional explain callback for weakref_lru_cache misses

f6b6edc

Co-authored-by: Parker Schuh <parkers@google.com> Co-authored-by: Yash Katariya <yashkatariya@google.com> PiperOrigin-RevId: 853442821

Merge pull request jax-ml#34209 from jakevdp:fix-wheel

75fd86b

PiperOrigin-RevId: 853444130

Merge pull request jax-ml#34211 from mattjj:andy-customvjp-none

75130d4

PiperOrigin-RevId: 853448754

Handle ad.Zero cotangents in _reshard_transpose_fancy.

caaad6e

Co-authored-by: Matthew Johnson <mattjj@google.com> PiperOrigin-RevId: 853464510

Update XLA dependency to use revision http://github.com/openxla/xla/c…

fd57ef6

…ommit/76db112da7c2c66afeb550fc1089e6bec297bd4d PiperOrigin-RevId: 853591014

[Pallas/Mosaic GPU] Enable more WarpSpecializedPipelineWGTests.

1199fd7

These tests pass just fine, now. PiperOrigin-RevId: 853694590

[pallas:sc] Allowed specifying tiling in pltpu.emit_pipeline

2a1a0a9

The tiling is global and applies to all refs passed to the pipeline function. This change is necessary to support using `pltpu.emit_pipeline` on SC where the tiling can either be (8, 128) or (8,). PiperOrigin-RevId: 853747075

[XLA:MGPU] Port Tiling to C++.

60a32d3

PiperOrigin-RevId: 853768795

deviceless aot test

8c2555a

[Mosaic] Move Float8EXMYType to tpu.td.

1c3aa2f

PiperOrigin-RevId: 853801923

Merge pull request jax-ml#33542 from keshavb96:deviceless_aot_test

4b25896

PiperOrigin-RevId: 853806284

Remove redundant test targets that are already executed as a part of …

4e89ce4

…`//tests/pallas:tpu_tests`. PiperOrigin-RevId: 853827799

respect self.statics in FlatTree.__eq__

9fad589

Co-authored-by: Roy Frostig <frostig@google.com>

Fix precommit breakage

b02123b

See https://github.com/jax-ml/jax/actions/runs/20830797751/job/59844186018. There were some new deps added in the recent MLIR update that were not propagated through. PiperOrigin-RevId: 853864868

sick

5279000

revive as many cache miss explanations as reasonably possible

0636587

Co-authored-by: Yash Katariya <yashkatariya@google.com>

[pallas:sc] Skip a few tests failing when the compiler uses tiled mem…

12dce9c

…refs PiperOrigin-RevId: 853931279

skip on jaxlib version

d6101d8

Add halt-for-connection to build_artifacts.yml workflow call.

909e638

PiperOrigin-RevId: 853966274

Merge pull request jax-ml#33839 from jax-ml:pjit-without-linear-util

a8ca82a

PiperOrigin-RevId: 853977644

Update XLA dependency to use revision http://github.com/openxla/xla/c…

774597a

…ommit/9ae3d6dab2c10c8195c8d9862f475904c7cdca91 PiperOrigin-RevId: 854059415

magaonka-amd and others added 5 commits January 23, 2026 10:21

Enable test_variadic_reduce_window on ROCm (#647)

134825e

Unskip supported dtypes for testConvolutionsPreferredElementType (#649)

ee5581c

Enabled test for condition number on ROCm devices. (#613)

10c7e75

(cherry picked from commit 02cf073)

Enabled RNN unit test: test_no_workspace_overflow for ROCm devices (#624

708d732

) (cherry picked from commit abbb2ef)

Added changes from PR #626 and PR #645. This also fixes merge conflic…

98cc66c

…ts from upstream. This uses the following commits: - 6162538 - 9344b2b

Ruturaj4 requested a review from a team as a code owner January 28, 2026 23:56

Ruturaj4 and others added 24 commits January 28, 2026 18:10

[Pallas] Fix ROCm GPU architecture detection and route to Triton backend

b77014f

Enable array interoperability tests on ROCm platform (#660)

91be368

Skip sparse tests on ROCm due to hipSPARSE issue (#652)

95e1df0

Update sparse test skip messages in v0.8.2 (#653)

e0739e6

Port skip for "test_prim_tridiagonal_solve" tests from JAX v0.8.0 to …

a988aeb

…v0.8.2 (#658)

Fix test_cuda_array_interface test skip condition (#657)

3467620

Enable testMultivariateNormalSingularCovariance on ROCm (#666)

ed14fc6

Skip test_batch_axis_sharding_jvp because of hipSPARSE issue (#667)

fc10735

Skip test_tridiagonal_solve on ROCm due to hipSPARSE numerical errors (…

93f5fdc

…#668)

Update Skip Reason Outputs (#663)

3747d88

Skip testCudaArrayInterfaceOnNonCudaFails on ROCm platform (#677)

103da59

Enable lobpcg tests on ROCm platform (#681)

d3cce97

Enable lax backend scipy tests on ROCm GPUs (#687)

498f735

Add ROCm encoding for test_struct_encoding_determinism (#683)

6701f5d

Remove 'mean' from unsupported params for jnp.var (#689)

4803ba8

Enable memory space export tests on ROCm GPUs (#690)

2c7ef61

Implement approx_tanh for ROCm using OCML tanh function (#691)

7071f71

Enable test deviceless aot compile test on ROCm (#694)

76c5165

Skipping testEighTinyNorm due to hipSolver issues (#697)

6df5f17

Modified memory space export test to run on ROCm (for some tests). (#698

ac4ba1c

)

Add device test unit tests for ROCm (JAX v0.9.0) (#705)

0988e6d

Skip test_tridiagonal_solve_grad test 0.9.0 (#703)

57c0080

Skip test_batch_axis_sharding_jvp13 test 0.9.0 (#709)

8cb19cf

Update skip message version from 0.8.0 to 0.9.0 for test_is_finite on… (

32ba3e9

#702)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rocm jaxlib v0.9.0#671

Rocm jaxlib v0.9.0#671
Ruturaj4 wants to merge 6583 commits intomainfrom
rocm-jaxlib-v0.9.0

Ruturaj4 commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Ruturaj4 commented Jan 28, 2026

Solution

Testing

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants