Skip to content

taliyahwebb/curses

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Node.js CI

Speech to Text Captions for OBS, VRChat, Twitch chat and Discord

Repo Stats

GitHub repo size GitHub language count GitHub top language GitHub last commit

Features

Instructions and details

  • Speech to Text
  • Text to Speech
  • OBS Captions customization: Colors, fonts, shadows, background textures, text typing animation, sound effects, particle effects and CSS
  • Native OBS stream captions
  • Google Fonts: more than 1000 free fonts for OBS captions
  • VRChat: KillFrenzy Avatar text, vrchat's chatbox
  • Twitch:
    • Use 7TV/FFZ/BTTV emotes in OBS captions
    • Post your STT to chat
    • Use your chat messages as a source for captions and TTS
    • native captions
  • Discord: Send your STT to specified channel
  • Scenes:
    • Save multiple designs and freely switch between them
    • Automatically switch design when OBS changes scene

Roadmap

Community

For help, feature requests, bug reports, release notifications, design templates, Join Discord.

Usage

Runtime Dependencies

Web renderer

On Windows, Edge WebView2 is required to render the app. This is done to make the app smaller on disk.

If you're not running an old Windows version and didn't accidentally remove it trying to debloat your computer, it should already be installed. Otherwise, you can download it from here.

Whisper STT

If you want to use the STT module Whisper you will need a Vulkan ready graphics driver installed.

  • NixOS: if you are using a recent NixOS version and have a graphical user environment enabled, it will likely ✨just work✨ if your hardware supports Vulkan
  • Other Linux: check your distributions documentation or see Arch Linux Wiki for more information
  • Windows: having up to date Graphics drivers should suffice if the hardware supports it

Here is a list of Vulkan ready devices. Most modern Graphics drivers should support Vulkan.

STT services

Every service has its pros and cons. I'd advice to read about them all before making your choice.

Web Speech API (STT)

Web Speech API is a general specification for web browsers to support both speech synthesis and recognition. Its implementation and voices available change depending on your operating system.

Windows We get the Web Speech API through Edge WebView2.

Edge WebView2 (probably) uses cloud services to provide Speech-To-Text to the Web Speech API (can't be sure because it's closed-source).

Linux We get the Web Speech API through WebKitGTK.

WebKitGTK does not support the speech recognition of Web Speech API yet, but everything should work as soon as the feature gets released. There have been experimentations by the WebKitGTK team to use Whisper.cpp, but "that is much farther down the roadmap" (2025/03/08).

Whisper

whisper.cpp is a port OpenAI's Whisper.

It works locally, without going through OpenAI's servers, and also supports GPU acceleration, with a pretty small performance cost. You can also automatically translate to english at the same time.

You're going to need to download a model (.bin) (or learn how obtain more models), and select it in the Whisper Model field.

Smaller models have a smaller performance impact, but larger models are more accurate. There are also english-only models (files with .en in their name), all others being multilingual. Models with -q5_0 take less memory and disk space and are often more efficient. -tdrz models can detect speaker changes but are more resource-intensive.

Tip

The base.en-q5_1 (ggml-base.en-q5_1.bin) gives pretty decent results when speaking clear english and is near instant on GPU and even works with acceptable performance on integrated graphics.

Browser

Browser allows you to open a browser (Chrome or Edge for now), and use the page it opens on as an input. It also uses the Web Speech API, but the provider is the web browser.

Note

Chrome uses Google's cloud computing services, and Edge probably does something similar.

Azure (STT)

Azure is Microsoft's cloud computing service. It uses per second billing.

You will need to find how to create an API key and paste it in the Key field.

Deepgram

Deepgram is a cloud service. It uses per minute billing for free accounts.

You will need to find how to create an API key and paste it in the Key field.

Speechly

Warning

Speechly was acquired by Roblox and it seems its Speech To Text API was shut down. This service may be removed in the future.

TTS services

Every service has its pros and cons. I'd advice to read about them all before making your choice.

Web Speech API (TTS)

Web Speech API is a general specification for web browsers to support both speech synthesis and recognition. Its implementation and voices available change depending on your operating system.

Windows We get the Web Speech API through Edge WebView2.

Edge WebView2 only supports local voices (due to the cost constraints). Afaik, it only uses the Windows voice packs for now, so here's how to add new voice packs to Windows (you might need to reboot after following these instructions).

Changing output device

You can't change the output device of this service inside Curses, but you change the system-wide output device of Edge WebView2 somewhere in your Windows settings. The instructions differ a bit on Windows 10/11 but you should be able to find instructions online.

Linux We get the Web Speech API through WebKitGTK.

WebKitGTK does not officially support the speech synthesis part of Web Speech API yet, but everything should work as soon as the feature gets released.

Piper

Piper is a Free and Open Source Text to Speech synthesizer. It generates the sound locally, and the voices are usually Public Domain (do check the license when downloading voices though).

You will need to follow these few steps to get it up and running, but don't be scared!

Note

On Linux, Piper might be in your package manager of choice. Make sure you install the TTS executable, and not the mouse configuration app! (e.g. piper-tts-{bin,git} from the AUR on Arch and not piper from extra)

  • Download the latest release of Piper, un-zip it and select it in Curses in the Executable field.
  • Create a directory (folder) where you will put your voices and select it in Curses in the Voice directory field.
  • Find a voice you like on https://rhasspy.github.io/piper-samples/, and download both the .onnx and .onnx.json files into the directory you created. Make sure both files have the same name (e.g. en_US-kristin-medium.onnx and en_US-kristin-medium.onnx.json).
  • Select said voice in Curses and you're good to go :)

Windows (TTS)

Windows provides the Microsoft Speech API (SAPI) which can be used for Text to Speech using the voices installed in your Windows instance.

Azure (TTS)

Azure is Microsoft's cloud computing service. It uses AI-powered voices, and usually uses per character billing (learn more).

You will need to find how to create an API key and paste it in the Key field.

TikTok

Fast and high quality voices obtained through an unofficial TikTok TTS API.

Warning

Not recommended for anything important (anything non-joke tbh), since TikTok might shutdown the API at any point (learn more).

Uberduck

AI voices paid with a subscription. API access is needed to use Uberduck through Curses.

You will need to find how to create an API key and paste it in the Api key field.

Custom TTS

Custom TTS isn't a service, but it allows you to plug in pretty much any TTS service.

You will probably need to create a wrapper script to make it work though.

Warning

The messages passed to this script might come from untrusted sources (eg. Twitch Chat), so make sure to properly sanitize the input as to not give random chatters access to your computer.

It executes the given file as a command and passes 2 arguments:

  • the path to a file containing the text to synthesize in UTF-8 format.
  • the path to an output file that should containing the audio to play back once the executable finishes.
Windows There are more advanced options for Windows users depending on the extension of the file.
Extension Command executed
.exe or .com %script%
.py python %script%
.ps1 powershell -ExecutionPolicy Bypass -File %script%
.* cmd /c %script%

(where %script% is the absolute path to the script)

Twitch

Using custom ports

If you are using a custom port (i.e. running curses --port {your port}), Twitch authentication might not work. This is because Twitch only allows a few static URLs as redirects, ports included.

If you are unable to use the default port (3030) for this one-time operation, you can try with any of 45561-45569.

If this is not an option for you, you can create your own app. Set the OAuth Redirect URL to http://localhost:{your port}/oauth_twitch.html, and the client type to Public. Then, pass the CURSES_TWITCH_CLIENT_ID env variable with the newly-generated client ID when running Curses.

Building

Prerequisites

Application framework dependencies:

Note

You can skip this step when using NixOS with the included Nix Flake.

NixOS

This repository provides a Nix flake which provides:

  • Development Environment via nix develop
  • Nix Package as the default flake package output
    • can be built with nix build (binary will be available as ./result/bin/curses)

The Development Environment provides all needed libraries to build the project.

Note: Runtime Dependencies

Other Linux

Additionally the following are required for building:

List of additional packages for arch linux: cmake shaderc alsa-lib vulkan-headers vulkan-icd-loader

Note: Runtime Dependencies

Windows

  • rust (or winget install rustup)
  • nodejs (or winget install nodejs)
  • pnpm (or winget install pnpm.pnpm)
  • vulkansdk (or winget install vulkansdk)
  • msvc
    • get the community visual studio installer (if you have it installed already, open the installer again and press 'modify' on your installed instance and make sure the shown below are checked)
    • 'Desktop development with C++'
  • clang lib (or winget install llvm)
  • cmake (or winget install cmake)

Note: Runtime Dependencies

Build

  1. Setup pnpm local dependencies
  • pnpm i --frozen-lockfile
  1. Choose from the following the action you want to perform
  • pnpm tauri dev build and run a local development version that restarts on code changes
  • pnpm tauri dev --release build and run the dev version with release settings
  • pnpm tauri build --no-bundle --debug to create a development build
    • binary will be produced at ./src-tauri/target/debug/<curses-bin>
  • pnpm tauri build --no-bundle to create a final build
    • binary will be produced at ./src-tauri/target/build/<curses-bin>

About

Speech to Text and KB input captions for OBS, VRChat, Twitch chat and Discord

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 76.5%
  • Rust 15.2%
  • HTML 4.3%
  • Nix 1.3%
  • CSS 1.1%
  • JavaScript 0.7%
  • Other 0.9%