Fix for ASE MACE calculator failures with TorchScript on ROCm #1044

7radians · 2025-06-30T13:09:58Z

Hi all,

Not sure if others have run into this issue on Archer2 or elsewhere, but in case this fix is useful:

Context

My collaborator's ASE MD runs with MACE 0.3.13 / 0.3.14 failed when using a ROCm PyTorch build on Archer2. TorchScript was enforcing strict schema checks, silently ignoring unknown kwargs and omitting optional outputs, causing runtime errors and deadlocks.

This PR

Improves robustness of the ASE MACE calculator to handle these scenarios:

Dynamic kwarg gating
Inspect model.forward at runtime and pass compute_edge_forces / compute_atomic_stresses only if supported, eliminating unknown‑kwarg errors on TorchScripted models on ROCm builds.
Safe output access
Replace out["..."] with out.get("...") plus null checks for atomic_stresses and atomic_virials, preventing KeyErrors or hangs when keys are absent.
Empty‑list stacking guard
Before aggregating per‑model tensors with torch.stack(), verify that the corresponding list is non‑empty, avoiding deadlocks.

All changes are in mace/calculators/mace.py.

Tested on CUDA, ROCm, and CPUs on the following machines:

Archer2
LUMI
Kelvin2

The overheads should be (and appear to be) negligible.

ilyes319 · 2025-07-02T14:32:46Z

Hey @7radians thank you for that, this seems very weird indeed. Can you tell me what error appeared?

7radians · 2025-07-02T16:10:25Z

@ilyes319 here are the errors my collaborator had:
1)
RuntimeError: Unknown keyword argument 'compute_edge_forces' for operator 'forward'. Schema: forward(torch.mace.modules.models.___torch_mangle_161.ScaleShiftMACE self, Dict(str, Tensor) data, bool training=False, bool compute_force=True, bool compute_virials=False, bool compute_stress=False, bool compute_displacement=False, bool compute_hessian=False) -> Dict(str, Tensor?)
2)
File ".../mace/calculators/mace.py", line 360, in calculate if out["atomic_stresses"] is not None: KeyError: 'atomic_stresses'
3) when the above two were fixed, there was a deadlock which I figured were the empty lists when doing torch.stack()

ilyes319 · 2025-07-02T17:12:25Z

mmm were you using a model that was compiled before hand on an older version of mace?

7radians · 2025-07-02T17:24:39Z

The model was compiled with mace 0.3.13, and failed both with 0.3.13 and 0.3.14, giving the same errors

Develop patch

7radians · 2025-07-04T15:40:08Z

Some other commit tagged along from the main and caused errors with the heads, here is the clean fix based on the develop branch, hopefully that's less hassle @ilyes319

calculator patch

2f8fcaf

update to develop

963029c

Develop patch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix for ASE MACE calculator failures with TorchScript on ROCm #1044

Fix for ASE MACE calculator failures with TorchScript on ROCm #1044

Uh oh!

7radians commented Jun 30, 2025

Uh oh!

ilyes319 commented Jul 2, 2025

Uh oh!

7radians commented Jul 2, 2025 •

edited

Loading

Uh oh!

ilyes319 commented Jul 2, 2025 •

edited

Loading

Uh oh!

7radians commented Jul 2, 2025

Uh oh!

7radians commented Jul 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix for ASE MACE calculator failures with TorchScript on ROCm #1044

Are you sure you want to change the base?

Fix for ASE MACE calculator failures with TorchScript on ROCm #1044

Uh oh!

Conversation

7radians commented Jun 30, 2025

Context

This PR

Uh oh!

ilyes319 commented Jul 2, 2025

Uh oh!

7radians commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilyes319 commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

7radians commented Jul 2, 2025

Uh oh!

7radians commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

7radians commented Jul 2, 2025 •

edited

Loading

ilyes319 commented Jul 2, 2025 •

edited

Loading

7radians commented Jul 4, 2025 •

edited

Loading