Skip to content

feat: optimizations for strix halo#125

Open
0xrushi wants to merge 2 commits intodevnen:mainfrom
0xrushi:fix/strixhalo
Open

feat: optimizations for strix halo#125
0xrushi wants to merge 2 commits intodevnen:mainfrom
0xrushi:fix/strixhalo

Conversation

@0xrushi
Copy link

@0xrushi 0xrushi commented Feb 24, 2026

Renamed Docker and compose files to clearly target AMD Strix Halo hardware and updated references accordingly.

Added bfloat16 optimization for the T3 model with autocast support, improving token generation speed by ~40% while keeping S3Gen in float32 for stability.

Introduced a voice conditioning cache to skip redundant voice encoding, saving 2–5 seconds on repeat requests.
Added configurable TTS_BF16 env var with auto hardware detection and backward-compatible fallbacks across GPUs/CPU.

Patched torch 2.9+ dtype issues and tuned Strix Halo ROCm settings, delivering 50% faster inference with zero breaking changes.

Before:
image

After:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant