feat: chunked TTS generation with quality selector by glaucusj-sai · Pull Request #99 · jamiepine/voicebox

glaucusj-sai · 2026-02-19T01:10:18Z

Summary

Long text that exceeds the Qwen3-TTS model's max_new_tokens=2048 limit (~170s audio) now gets automatically handled:

Text splitting: Splits at sentence boundaries (with clause/word fallbacks) into configurable chunks (default 800 chars)
Crossfade concatenation: Joins audio chunks with a 50ms crossfade to eliminate clicks at boundaries
Quality selector: Runtime-switchable between standard (24kHz native) and high (44.1kHz via soxr VHQ resampling)
Settings API: New GET/POST /tts/settings endpoints for runtime quality control without restart

Short text (<800 chars) uses the original single-shot fast path with zero overhead.

Changes

File	Change
`backend/utils/chunked_tts.py`	New: text splitting, audio concat, resampling utilities
`backend/backends/pytorch_backend.py`	Integrate chunking into `generate()`, extract `_generate_single()`
`backend/main.py`	Add `GET/POST /tts/settings` endpoints
`backend/models.py`	Add `TTSSettingsUpdate` model, bump text max_length to 50000
`backend/requirements.txt`	Add `soxr>=0.3.0` for high-quality resampling

Environment variables

Variable	Default	Description
`TTS_QUALITY`	`standard`	Output quality (`standard`=24kHz, `high`=44.1kHz)
`TTS_MAX_CHUNK_CHARS`	`800`	Max characters per chunk
`TTS_UPSAMPLE_RATE`	`44100`	Target sample rate for high quality

Test plan

Short text (<800 chars): uses single-shot path, no chunking overhead
Long text (9K+ chars): splits into ~12 chunks, generates and concatenates seamlessly
Quality switch to high: output sample rate changes to 44100
Switch back to standard: output returns to 24000
GET /tts/settings returns current config
POST /tts/settings with {"quality":"high"} updates at runtime

Tested on NVIDIA DGX Spark with Qwen3-TTS 1.7B — 9K character input produced ~12 minutes of seamless audio.

Long text that exceeds the model's max_new_tokens limit now gets automatically split at sentence boundaries, generated per-chunk, and concatenated with a short crossfade. A runtime-configurable quality setting lets users choose between standard (24 kHz native) and high (44.1 kHz via soxr VHQ resampling). Changes: - Add backend/utils/chunked_tts.py with text splitting, audio concatenation, and resampling utilities - Integrate chunking directly into PyTorchTTSBackend.generate() so both the UI /generate and any API consumer benefit - Add GET/POST /tts/settings endpoints for runtime quality control - Bump GenerationRequest.text max_length from 5000 to 50000 - Add soxr to requirements.txt Tested with 9K+ character input producing ~12 minutes of seamless audio on an NVIDIA DGX Spark (Qwen3-TTS 1.7B).

TacoDark · 2026-02-19T13:20:18Z

AI Generated pull request. Please review code to make sure it works.

glaucusj-sai · 2026-02-19T21:26:08Z

AI Generated pull request. Please review code to make sure it works.

yes it has been tested and i have used AI to post as i didn't have much experence with forking. this was done for my project that needed large scripts for podcast where quality matters and need long text to convert in the voice. I have generated more then 10 hours of voice so far with this. thanks for your commets.

TacoDark · 2026-02-21T11:48:23Z

AI Generated pull request. Please review code to make sure it works.

yes it has been tested and i have used AI to post as i didn't have much experence with forking. this was done for my project that needed large scripts for podcast where quality matters and need long text to convert in the voice. I have generated more then 10 hours of voice so far with this. thanks for your commets.

Thank you for being honest, I approve of commit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: chunked TTS generation with quality selector#99

feat: chunked TTS generation with quality selector#99
glaucusj-sai wants to merge 1 commit intojamiepine:mainfrom
glaucusj-sai:feat/chunked-tts-quality

glaucusj-sai commented Feb 19, 2026

Uh oh!

TacoDark commented Feb 19, 2026

Uh oh!

glaucusj-sai commented Feb 19, 2026

Uh oh!

TacoDark commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

glaucusj-sai commented Feb 19, 2026

Summary

Changes

Environment variables

Test plan

Uh oh!

TacoDark commented Feb 19, 2026

Uh oh!

glaucusj-sai commented Feb 19, 2026

Uh oh!

TacoDark commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants