Speech to Text Captions for OBS, VRChat, Twitch chat and Discord
Repo Stats
- Speech to Text
- Text to Speech
- OBS Captions customization: Colors, fonts, shadows, background textures, text typing animation, sound effects, particle effects and CSS
- Native OBS stream captions
- Google Fonts: more than 1000 free fonts for OBS captions
- VRChat: KillFrenzy Avatar text, vrchat's chatbox
- Twitch:
- Use 7TV/FFZ/BTTV emotes in OBS captions
- Post your STT to chat
- Use your chat messages as a source for captions and TTS
- native captions
- Discord: Send your STT to specified channel
- Scenes:
- Save multiple designs and freely switch between them
- Automatically switch design when OBS changes scene
For help, feature requests, bug reports, release notifications, design templates, Join Discord.
On Windows, Edge WebView2 is required to render the app. This is done to make the app smaller on disk.
If you're not running an old Windows version and didn't accidentally remove it trying to debloat your computer, it should already be installed. Otherwise, you can download it from here.
If you want to use the STT module Whisper you will need a Vulkan ready graphics driver installed.
- NixOS: if you are using a recent NixOS version and have a graphical user environment enabled, it will likely ✨just work✨ if your hardware supports Vulkan
- Other Linux: check your distributions documentation or see Arch Linux Wiki for more information
- Windows: having up to date Graphics drivers should suffice if the hardware supports it
Here is a list of Vulkan ready devices. Most modern Graphics drivers should support Vulkan.
Every service has its pros and cons. I'd advice to read about them all before making your choice.
Web Speech API is a general specification for web browsers to support both speech synthesis and recognition. Its implementation and voices available change depending on your operating system.
Windows
We get the Web Speech API through Edge WebView2.Edge WebView2 (probably) uses cloud services to provide Speech-To-Text to the Web Speech API (can't be sure because it's closed-source).
Linux
We get the Web Speech API through WebKitGTK.WebKitGTK does not support the speech recognition of Web Speech API yet, but everything should work as soon as the feature gets released. There have been experimentations by the WebKitGTK team to use Whisper.cpp, but "that is much farther down the roadmap" (2025/03/08).
whisper.cpp is a port OpenAI's Whisper.
It works locally, without going through OpenAI's servers, and also supports GPU acceleration, with a pretty small performance cost. You can also automatically translate to english at the same time.
You're going to need to download a model (.bin) (or learn how obtain more models), and select it in the Whisper Model field.
Smaller models have a smaller performance impact, but larger models are more accurate. There are also english-only models (files with .en in their name), all others being multilingual. Models with -q5_0 take less memory and disk space and are often more efficient. -tdrz models can detect speaker changes but are more resource-intensive.
Tip
The base.en-q5_1 (ggml-base.en-q5_1.bin) gives pretty decent results when speaking clear english and is near instant on GPU and even works with acceptable performance on integrated graphics.
Browser allows you to open a browser (Chrome or Edge for now), and use the page it opens on as an input. It also uses the Web Speech API, but the provider is the web browser.
Note
Chrome uses Google's cloud computing services, and Edge probably does something similar.
Azure is Microsoft's cloud computing service. It uses per second billing.
You will need to find how to create an API key and paste it in the Key field.
Deepgram is a cloud service. It uses per minute billing for free accounts.
You will need to find how to create an API key and paste it in the Key field.
Warning
Speechly was acquired by Roblox and it seems its Speech To Text API was shut down. This service may be removed in the future.
Every service has its pros and cons. I'd advice to read about them all before making your choice.
Web Speech API is a general specification for web browsers to support both speech synthesis and recognition. Its implementation and voices available change depending on your operating system.
Windows
We get the Web Speech API through Edge WebView2.Edge WebView2 only supports local voices (due to the cost constraints). Afaik, it only uses the Windows voice packs for now, so here's how to add new voice packs to Windows (you might need to reboot after following these instructions).
You can't change the output device of this service inside Curses, but you change the system-wide output device of Edge WebView2 somewhere in your Windows settings. The instructions differ a bit on Windows 10/11 but you should be able to find instructions online.
Linux
We get the Web Speech API through WebKitGTK.WebKitGTK does not officially support the speech synthesis part of Web Speech API yet, but everything should work as soon as the feature gets released.
Piper is a Free and Open Source Text to Speech synthesizer. It generates the sound locally, and the voices are usually Public Domain (do check the license when downloading voices though).
You will need to follow these few steps to get it up and running, but don't be scared!
Note
On Linux, Piper might be in your package manager of choice. Make sure you install the TTS executable, and not the mouse configuration app! (e.g. piper-tts-{bin,git} from the AUR on Arch and not piper from extra)
- Download the latest release of Piper, un-zip it and select it in Curses in the Executable field.
- Create a directory (folder) where you will put your voices and select it in Curses in the Voice directory field.
- Find a voice you like on https://rhasspy.github.io/piper-samples/, and download both the
.onnxand.onnx.jsonfiles into the directory you created. Make sure both files have the same name (e.g.en_US-kristin-medium.onnxanden_US-kristin-medium.onnx.json). - Select said voice in Curses and you're good to go :)
Windows provides the Microsoft Speech API (SAPI) which can be used for Text to Speech using the voices installed in your Windows instance.
Azure is Microsoft's cloud computing service. It uses AI-powered voices, and usually uses per character billing (learn more).
You will need to find how to create an API key and paste it in the Key field.
Fast and high quality voices obtained through an unofficial TikTok TTS API.
Warning
Not recommended for anything important (anything non-joke tbh), since TikTok might shutdown the API at any point (learn more).
AI voices paid with a subscription. API access is needed to use Uberduck through Curses.
You will need to find how to create an API key and paste it in the Api key field.
Custom TTS isn't a service, but it allows you to plug in pretty much any TTS service.
You will probably need to create a wrapper script to make it work though.
Warning
The messages passed to this script might come from untrusted sources (eg. Twitch Chat), so make sure to properly sanitize the input as to not give random chatters access to your computer.
It executes the given file as a command and passes 2 arguments:
- the path to a file containing the text to synthesize in UTF-8 format.
- the path to an output file that should containing the audio to play back once the executable finishes.
Windows
There are more advanced options for Windows users depending on the extension of the file.| Extension | Command executed |
|---|---|
| .exe or .com | %script% |
| .py | python %script% |
| .ps1 | powershell -ExecutionPolicy Bypass -File %script% |
| .* | cmd /c %script% |
(where %script% is the absolute path to the script)
If you are using a custom port (i.e. running curses --port {your port}), Twitch authentication might not work. This is because Twitch only allows a few static URLs as redirects, ports included.
If you are unable to use the default port (3030) for this one-time operation, you can try with any of 45561-45569.
If this is not an option for you, you can create your own app. Set the OAuth Redirect URL to http://localhost:{your port}/oauth_twitch.html, and the client type to Public. Then, pass the CURSES_TWITCH_CLIENT_ID env variable with the newly-generated client ID when running Curses.
Application framework dependencies:
Note
You can skip this step when using NixOS with the included Nix Flake.
This repository provides a Nix flake which provides:
- Development Environment via
nix develop - Nix Package as the default flake package output
- can be built with
nix build(binary will be available as./result/bin/curses)
- can be built with
The Development Environment provides all needed libraries to build the project.
Note: Runtime Dependencies
Additionally the following are required for building:
- cmake
- shaderc
- clang
- alsa-lib
- vulkan-headers
- vulkan-loader (
vulkan-icd-loaderon arch-linux)
List of additional packages for arch linux: cmake shaderc alsa-lib vulkan-headers vulkan-icd-loader
Note: Runtime Dependencies
- rust (or
winget install rustup) - nodejs (or
winget install nodejs) - pnpm (or
winget install pnpm.pnpm) - vulkansdk (or
winget install vulkansdk) - msvc
- get the community visual studio installer (if you have it installed already, open the installer again and press 'modify' on your installed instance and make sure the shown below are checked)
- 'Desktop development with C++'
- clang lib (or
winget install llvm) - cmake (or
winget install cmake)
Note: Runtime Dependencies
- Setup pnpm local dependencies
pnpm i --frozen-lockfile
- Choose from the following the action you want to perform
pnpm tauri devbuild and run a local development version that restarts on code changespnpm tauri dev --releasebuild and run the dev version with release settingspnpm tauri build --no-bundle --debugto create a development build- binary will be produced at
./src-tauri/target/debug/<curses-bin>
- binary will be produced at
pnpm tauri build --no-bundleto create a final build- binary will be produced at
./src-tauri/target/build/<curses-bin>
- binary will be produced at

