Features

Speech to Text Captions for OBS, VRChat, Twitch chat and Discord

Repo Stats

Features
- Roadmap
Community
Usage
Building
- Prerequisites
- Build

Features

Instructions and details

Speech to Text
Text to Speech
OBS Captions customization: Colors, fonts, shadows, background textures, text typing animation, sound effects, particle effects and CSS
Native OBS stream captions
Google Fonts: more than 1000 free fonts for OBS captions
VRChat: KillFrenzy Avatar text, vrchat's chatbox
Twitch:
- Use 7TV/FFZ/BTTV emotes in OBS captions
- Post your STT to chat
- Use your chat messages as a source for captions and TTS
- native captions
Discord: Send your STT to specified channel
Scenes:
- Save multiple designs and freely switch between them
- Automatically switch design when OBS changes scene

Roadmap

see Github Milestones

Community

For help, feature requests, bug reports, release notifications, design templates, Join Discord.

Usage

Runtime Dependencies

Web renderer

On Windows, Edge WebView2 is required to render the app. This is done to make the app smaller on disk.

If you're not running an old Windows version and didn't accidentally remove it trying to debloat your computer, it should already be installed. Otherwise, you can download it from here.

Whisper STT

If you want to use the STT module Whisper you will need a Vulkan ready graphics driver installed.

NixOS: if you are using a recent NixOS version and have a graphical user environment enabled, it will likely ✨just work✨ if your hardware supports Vulkan
Other Linux: check your distributions documentation or see Arch Linux Wiki for more information
Windows: having up to date Graphics drivers should suffice if the hardware supports it

Here is a list of Vulkan ready devices. Most modern Graphics drivers should support Vulkan.

STT services

Every service has its pros and cons. I'd advice to read about them all before making your choice.

Web Speech API (STT)

Web Speech API is a general specification for web browsers to support both speech synthesis and recognition. Its implementation and voices available change depending on your operating system.

Windows

We get the Web Speech API through Edge WebView2.

Edge WebView2 (probably) uses cloud services to provide Speech-To-Text to the Web Speech API (can't be sure because it's closed-source).

Linux

We get the Web Speech API through WebKitGTK.

WebKitGTK does not support the speech recognition of Web Speech API yet, but everything should work as soon as the feature gets released. There have been experimentations by the WebKitGTK team to use Whisper.cpp, but "that is much farther down the roadmap" (2025/03/08).

Whisper

whisper.cpp is a port OpenAI's Whisper.

It works locally, without going through OpenAI's servers, and also supports GPU acceleration, with a pretty small performance cost. You can also automatically translate to english at the same time.

You're going to need to download a model (.bin) (or learn how obtain more models), and select it in the Whisper Model field.

Smaller models have a smaller performance impact, but larger models are more accurate. There are also english-only models (files with .en in their name), all others being multilingual. Models with -q5_0 take less memory and disk space and are often more efficient. -tdrz models can detect speaker changes but are more resource-intensive.

Tip

The base.en-q5_1 (ggml-base.en-q5_1.bin) gives pretty decent results when speaking clear english and is near instant on GPU and even works with acceptable performance on integrated graphics.

Browser

Browser allows you to open a browser (Chrome or Edge for now), and use the page it opens on as an input. It also uses the Web Speech API, but the provider is the web browser.

Note

Chrome uses Google's cloud computing services, and Edge probably does something similar.

Azure (STT)

Azure is Microsoft's cloud computing service. It uses per second billing.

You will need to find how to create an API key and paste it in the Key field.

Deepgram

Deepgram is a cloud service. It uses per minute billing for free accounts.

You will need to find how to create an API key and paste it in the Key field.

Speechly

Warning

Speechly was acquired by Roblox and it seems its Speech To Text API was shut down. This service may be removed in the future.

TTS services

Every service has its pros and cons. I'd advice to read about them all before making your choice.

Web Speech API (TTS)

Web Speech API is a general specification for web browsers to support both speech synthesis and recognition. Its implementation and voices available change depending on your operating system.

Windows

We get the Web Speech API through Edge WebView2.

Edge WebView2 only supports local voices (due to the cost constraints). Afaik, it only uses the Windows voice packs for now, so here's how to add new voice packs to Windows (you might need to reboot after following these instructions).

Changing output device

You can't change the output device of this service inside Curses, but you change the system-wide output device of Edge WebView2 somewhere in your Windows settings. The instructions differ a bit on Windows 10/11 but you should be able to find instructions online.

Linux

We get the Web Speech API through WebKitGTK.

WebKitGTK does not officially support the speech synthesis part of Web Speech API yet, but everything should work as soon as the feature gets released.

Piper

Piper is a Free and Open Source Text to Speech synthesizer. It generates the sound locally, and the voices are usually Public Domain (do check the license when downloading voices though).

You will need to follow these few steps to get it up and running, but don't be scared!

Note

On Linux, Piper might be in your package manager of choice. Make sure you install the TTS executable, and not the mouse configuration app! (e.g. piper-tts-{bin,git} from the AUR on Arch and not piper from extra)

Download the latest release of Piper, un-zip it and select it in Curses in the Executable field.
Create a directory (folder) where you will put your voices and select it in Curses in the Voice directory field.
Find a voice you like on https://rhasspy.github.io/piper-samples/, and download both the .onnx and .onnx.json files into the directory you created. Make sure both files have the same name (e.g. en_US-kristin-medium.onnx and en_US-kristin-medium.onnx.json).
Select said voice in Curses and you're good to go :)

Windows (TTS)

Windows provides the Microsoft Speech API (SAPI) which can be used for Text to Speech using the voices installed in your Windows instance.

Azure (TTS)

Azure is Microsoft's cloud computing service. It uses AI-powered voices, and usually uses per character billing (learn more).

You will need to find how to create an API key and paste it in the Key field.

TikTok

Fast and high quality voices obtained through an unofficial TikTok TTS API.

Warning

Not recommended for anything important (anything non-joke tbh), since TikTok might shutdown the API at any point (learn more).

Uberduck

AI voices paid with a subscription. API access is needed to use Uberduck through Curses.

You will need to find how to create an API key and paste it in the Api key field.

Custom TTS

Custom TTS isn't a service, but it allows you to plug in pretty much any TTS service.

You will probably need to create a wrapper script to make it work though.

Warning

The messages passed to this script might come from untrusted sources (eg. Twitch Chat), so make sure to properly sanitize the input as to not give random chatters access to your computer.

It executes the given file as a command and passes 2 arguments:

the path to a file containing the text to synthesize in UTF-8 format.
the path to an output file that should containing the audio to play back once the executable finishes.

Windows

There are more advanced options for Windows users depending on the extension of the file.

Extension	Command executed
.exe or .com	`%script%`
.py	`python %script%`
.ps1	`powershell -ExecutionPolicy Bypass -File %script%`
.*	`cmd /c %script%`

(where %script% is the absolute path to the script)

Twitch

Using custom ports

If you are using a custom port (i.e. running curses --port {your port}), Twitch authentication might not work. This is because Twitch only allows a few static URLs as redirects, ports included.

If you are unable to use the default port (3030) for this one-time operation, you can try with any of 45561-45569.

If this is not an option for you, you can create your own app. Set the OAuth Redirect URL to http://localhost:{your port}/oauth_twitch.html, and the client type to Public. Then, pass the CURSES_TWITCH_CLIENT_ID env variable with the newly-generated client ID when running Curses.

Building

Prerequisites

Application framework dependencies:

Tauri

Note

You can skip this step when using NixOS with the included Nix Flake.

NixOS

This repository provides a Nix flake which provides:

Development Environment via nix develop
Nix Package as the default flake package output
- can be built with nix build (binary will be available as ./result/bin/curses)

The Development Environment provides all needed libraries to build the project.

Note: Runtime Dependencies

Other Linux

Additionally the following are required for building:

cmake
shaderc
clang
alsa-lib
vulkan-headers
vulkan-loader (vulkan-icd-loader on arch-linux)

List of additional packages for arch linux: cmake shaderc alsa-lib vulkan-headers vulkan-icd-loader

Note: Runtime Dependencies

Windows

rust (or winget install rustup)
nodejs (or winget install nodejs)
pnpm (or winget install pnpm.pnpm)
vulkansdk (or winget install vulkansdk)
msvc
- get the community visual studio installer (if you have it installed already, open the installer again and press 'modify' on your installed instance and make sure the shown below are checked)
- 'Desktop development with C++'
clang lib (or winget install llvm)
cmake (or winget install cmake)

Note: Runtime Dependencies

Build

Setup pnpm local dependencies

pnpm i --frozen-lockfile

Choose from the following the action you want to perform

pnpm tauri dev build and run a local development version that restarts on code changes
pnpm tauri dev --release build and run the dev version with release settings
pnpm tauri build --no-bundle --debug to create a development build
- binary will be produced at ./src-tauri/target/debug/<curses-bin>
pnpm tauri build --no-bundle to create a final build
- binary will be produced at ./src-tauri/target/build/<curses-bin>

Name		Name	Last commit message	Last commit date
Latest commit History 428 Commits
.config/flakebox		.config/flakebox
.github		.github
.vscode		.vscode
dev-tools		dev-tools
misc		misc
public		public
src-tauri		src-tauri
src		src
tests		tests
.env		.env
.envrc		.envrc
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
.typos.toml		.typos.toml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
index.html		index.html
justfile		justfile
package.json		package.json
playwright.config.ts		playwright.config.ts
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.cjs		postcss.config.cjs
tailwind.config.cjs		tailwind.config.cjs
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

License

taliyahwebb/curses

Folders and files

Latest commit

History

Repository files navigation

Features

Roadmap

Community

Usage

Runtime Dependencies

Web renderer

Whisper STT

STT services

Web Speech API (STT)

Whisper

Browser

Azure (STT)

Deepgram

Speechly

TTS services

Web Speech API (TTS)

Changing output device

Piper

Windows (TTS)

Azure (TTS)

TikTok

Uberduck

Custom TTS

Twitch

Using custom ports

Building

Prerequisites

NixOS

Other Linux

Windows

Build

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages