Skip to content

Comments

[WIP] [ROCm][XLA:GPU] Prefer loop emitter for sibling concat fusions#518

Open
nurmukhametov wants to merge 1 commit intorocm-jaxlib-v0.8.0from
anurmukh/fallback-to-loop-emitter-for-sibling-concats-v0.8.0
Open

[WIP] [ROCm][XLA:GPU] Prefer loop emitter for sibling concat fusions#518
nurmukhametov wants to merge 1 commit intorocm-jaxlib-v0.8.0from
anurmukh/fallback-to-loop-emitter-for-sibling-concats-v0.8.0

Conversation

@nurmukhametov
Copy link

Concat fusions (kConcatenate) cannot be merged by multi_output_fusion. When sibling concats share inputs, this leads to duplicate memory reads and serialized kernel launches. Fall back to the loop emitter to enable their fusion to a single kernel.

Concat fusions (kConcatenate) cannot be merged by multi_output_fusion.
When sibling concats share inputs, this leads to duplicate memory reads
and serialized kernel launches. Fall back to the loop emitter to enable
their fusion to a single kernel.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant