Skip to content

Bug/Refactor- Implement Gradient Clipping and EMA for Training Stability #157

@mnk-nasir

Description

@mnk-nasir

Observation: Training loss occasionally diverges when using high learning rates or large batches during the flow-matching optimization process.

Proposed Fix:

Add torch.nn.utils.clip_grad_norm_ to the training loop (suggested max norm: 1.0).

Implement Exponential Moving Average (EMA) for model weights to improve the robustness of the generated speech samples.

Verification: Monitor the vector field loss to ensure smoother convergence over 50k+ steps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions