Skip to content

[pull] main from m-bain:main#67

Merged
pull[bot] merged 3 commits intoAsofwar:mainfrom
m-bain:main
Mar 10, 2026
Merged

[pull] main from m-bain:main#67
pull[bot] merged 3 commits intoAsofwar:mainfrom
m-bain:main

Conversation

@pull
Copy link

@pull pull bot commented Mar 10, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

claude and others added 3 commits March 10, 2026 13:35
PR #986 ("support timestamps for numbers") introduced three changes that
together broke CTC forced alignment:

1. Unknown chars (numbers, punctuation) were replaced with '*' wildcards
   mapped to token -1. get_wildcard_emission() scored these using
   torch.max() over all non-blank emissions, so wildcards greedily matched
   any speech-like signal in the segment window.

2. get_trellis() was rewritten with a different shape (num_frame, num_tokens)
   and incompatible initialization, discarding the original SoS-offset design
   from the PyTorch forced alignment tutorial.

3. backtrack() was replaced with backtrack_beam(), which always starts
   backtracking from the last frame of the segment window. The original
   backtrack() used torch.argmax() on the last token column to determine
   the starting frame. With padded segment boundaries (silence before/after
   speech), the new implementation spread all tokens across the full window,
   placing the first word at the start of the silence instead of the speech.

This commit restores the original PyTorch tutorial implementation:
- Unknown chars are skipped; words with only unknown chars become
  unalignable and get no timestamps (handled by interpolate_nans)
- get_trellis: restored (num_frame+1, num_tokens+1) shape with SoS offset
- backtrack: restored torch.argmax-based starting frame
- Removed backtrack_beam, get_wildcard_emission, BeamState, Path

Verified: v3.3.0 (pre-#986) produced correct timestamps with padded
segment boundaries; this fix reproduces that behavior.

Fixes #1220

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ktrack

The original code accepted blank_id as a parameter but used hardcoded 0
in two places, breaking alignment for HuggingFace models where the blank
token is [pad] (not index 0).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@pull pull bot locked and limited conversation to collaborators Mar 10, 2026
@pull pull bot added the ⤵️ pull label Mar 10, 2026
@pull pull bot merged commit 6d3edb1 into Asofwar:main Mar 10, 2026
4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants