Skip to content

Add Mid-Generation Streaming Support#124

Open
cvaz1306 wants to merge 1 commit intodevnen:mainfrom
cvaz1306:main
Open

Add Mid-Generation Streaming Support#124
cvaz1306 wants to merge 1 commit intodevnen:mainfrom
cvaz1306:main

Conversation

@cvaz1306
Copy link

Summary

This PR introduces a high-performance streaming architecture to the TTS server. Users can now hear and see audio as it is being generated chunk-by-chunk, significantly reducing "Time to First Sound" (TTFS) for long texts. This PR also includes critical stability fixes for audio degradation and digital noise.

Key Features

  • Added a new /tts/stream API endpoint that yields audio chunks immediately using an asynchronous generator.
  • A new UI control allows users to switch between standard batch generation and live streaming mode.

Testing Instructions

  1. Restart the server.
  2. Enter a long paragraph of text (3-4 sentences).
  3. Ensure "Stream Audio" is checked.
  4. Click "Generate Speech".
  5. Verify: Audio should start playing within seconds.
  6. Verify: Press the "Pause" button during generation to ensure audio stops and can be resumed.

@cvaz1306 cvaz1306 changed the title Add Mid-Generation Streaming Support and Real-Time Visualization Add Mid-Generation Streaming Support Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant