Skip to content

Comments

suppress cuda error messages (harmless) when doing training on ROCm#580

Open
cj401-amd wants to merge 3 commits intorocm-jaxlib-v0.8.0from
ci_cj_suppress_cuda_rocm-jaxlib-v0.8.0
Open

suppress cuda error messages (harmless) when doing training on ROCm#580
cj401-amd wants to merge 3 commits intorocm-jaxlib-v0.8.0from
ci_cj_suppress_cuda_rocm-jaxlib-v0.8.0

Conversation

@cj401-amd
Copy link

@cj401-amd cj401-amd commented Jan 26, 2026

2025-10-04 10:31:19.517493: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1759573879.528330   74721 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1759573879.531514   74721 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1759573879.540288   74721 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1759573879.540309   74721 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1759573879.540312   74721 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.

Copy link
Collaborator

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please upstream this to openxla

@i-chaochen i-chaochen added cherry-pick-candidate Mark a PR to be cherry-picked into the next ROCm JAX. Remove IIF the latest upstream contain the PR. open-upstream Tag when you want a copy of this PR to be opened on upstream Upstream rocm-jaxlib-v0.8.0 labels Jan 26, 2026
if (placers.find(id) != placers.end()) {
LOG(WARNING) << "Computation placer creation function is already "
"registered for this platform";
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once you return here, this wouldn't execute

placers[id].creation_function = creation_function;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected it now.

@cj401-amd cj401-amd force-pushed the ci_cj_suppress_cuda_rocm-jaxlib-v0.8.0 branch from 3e14c4b to 3e8bac9 Compare January 27, 2026 14:14
Copy link
Collaborator

@i-chaochen i-chaochen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please put your upstream PR in the PR description

void initialize_cublas() {
// Check if already registered before attempting - prevents duplicate
// registration error messages (can happen with multiple library loads)
auto already_registered = PluginRegistry::Instance()->HasFactory(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need this in rocm_blas.cc side?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's been implemented already here https://github.com/ROCm/xla/blame/rocm-jaxlib-v0.8.0/xla/stream_executor/rocm/rocm_blas.cc#L1283. so on NV side, it won't emit error message related to ROCm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-candidate Mark a PR to be cherry-picked into the next ROCm JAX. Remove IIF the latest upstream contain the PR. open-upstream Tag when you want a copy of this PR to be opened on upstream rocm-jaxlib-v0.8.0 Upstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants