Skip to content

Good AIMD RMSE but unstable MD; adding distortions breaks TRAIN error without fixing MD stability #1317

@Abdelazim-Abdelgawwad

Description

@Abdelazim-Abdelgawwad

Summary

I am training MACE on a 10 ps DFT-AIMD trajectory.
Although the model shows reasonable RMSE on AIMD data, MD with the trained MACE potential is unstable (structure distortion and temperature blow-up). Adding distorted geometries to the training set does not resolve the MD instability and instead severely degrades training metrics.

Case 1: AIMD-only training (10 ps)

Training/validation/test split by continuous time blocks.

TRAIN: RMSE F = 25.8 meV/Å
VALID: RMSE F = 23.4 meV/Å
TEST : RMSE F = 22.7 meV/Å

Despite these errors, running MD with this model leads to:

  • rapid structural distortion
  • unphysical temperature increase
  • instability even with smaller timesteps

Case 2: AIMD + distorted geometries (distortions added only to TRAIN)

TRAIN: RMSE F = 6814.1 meV/Å
VALID: RMSE F =   27.3 meV/Å
TEST : RMSE F =   25.9 meV/Å

Observations:

  • Validation and test errors remain reasonable
  • Training force RMSE becomes extremely large
  • MD remains unstable (same failure mode as AIMD-only model)

Training settings

    --foundation_model="small" \
    --energy_key="REF_energy" \
    --multiheads_finetuning=False \
    --forces_key="REF_forces" \
    --model="MACE" \
    --E0s="average" \
    --num_channels=256 \
    --max_L=2 \
    --correlation=3 \
    --max_num_epochs=500 \
    --batch_size=10 \
    --patience=50 \
    --valid_batch_size=10 \
    --lr=0.001 \
    --energy_weight=1.0 \
    --forces_weight=100.0 \
    --weight_decay=1e-8 \
    --error_table='PerAtomMAE' \
    --ema \
    --ema_decay=0.99 \
    --amsgrad \
    --restart_latest \
    --default_dtype="float64" \
    --device=cuda \
    --seed=1 \
    --scaling='rms_forces_scaling' \
    --save_cpu

Questions

  1. Is it expected that reasonable AIMD RMSE does not guarantee MD stability, even for short trajectories?
  2. Is adding distorted geometries to TRAIN the correct strategy for improving MD stability, or should they be treated/weighted differently?
  3. How should one interpret very large TRAIN force RMSE when VALID/TEST remain low?
  4. Are there recommended practices (e.g. weighting, config types, active learning) for stabilizing MD in this situation?

Goal

My goal is not to replace AIMD, but to:

  • obtain a MACE potential that can safely reproduce AIMD dynamics

Any guidance on best practices for this workflow would be greatly appreciated. because I am a new user for MACE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions