usevoiceai

The Typescript toolkit for ambitious voice AI apps

Build end-to-end voice AI apps in a few lines of code. Speech-To-Text -> Your agent processing code -> Text-To-Speech like it's no big deal. Speaking to your computer is not going to be a pain anymore.

Voice models have gotten really good lately but the infra to stitch everything together is lacking. Model providers have their SDKs but every provider have different interfaces. We have frameworks like Pipecat which are great but I needed something like AI SDK to get from idea to prod as soon as it is possible. In fact, the API is hugely inspired by AI SDK. So usevoiceai is an attempt to build something sophisticated with the same API simplicity and engineer ergonomics.

Tutorial

Enough talk! Let's jump right into it.

The SDK has two main parts for the minimum setup required to get up and running.

Client quickstart

Add useVoice and useSpeech hooks to your react component like this. Call startRecording to start streaming your voice. As you speak, partial transcripts will be reflected in the transcript variable. It's reactive so you can use it to show incremental transcripts.

Once you call stopRecording, status will move into the processing stage. After the server is done processing, status moves to the complete stage and speechStream becomes available. You can pass it to the useSpeech hook to automatically start playing the response audio or use it however you want. It's just an async iterable containing raw audio PCM chunks.

import { useSpeech, useVoice } from "@usevoiceai/react";

export function App() {
  const { startRecording, stopRecording, transcript, speechStream } = useVoice();
  const { stop } = useSpeech({ speechStream });
}

Server quickstart

Here we are using the Cloudflare's Durable Object websockets adapter. Support for more transport is coming soon.

Implment the AgentProcessor interface. You are given the transcript from the STT provider and a send callback to forward your response text to the TTS provider.

Use the Durable Objects factory function createVoiceDurableObject to forward the web request to it. It automatically upgrades the request to websocket internally. Now the subsequent requests will directly go to that socket connection for the whole client session.

import { cartesia } from "@usevoiceai/cartesia";
import { deepgram } from "@usevoiceai/deepgram";
import { AgentProcessor, createVoiceDurableObject } from "@usevoiceai/server";

class MockAgentProcessor implements AgentProcessor {
  constructor(private env: Env) {}
  async process({
    transcript,
    send,
  }: Parameters<AgentProcessor["process"]>[0]) {
    // do something with the transcript and return response
    await send({
      type: "complete",
      data: {
        responseText: response,
      },
    });
  }
}

const VoiceSessionDO = createVoiceDurableObject<Env>({
  transcription: (env) => deepgram("nova-3", { apiKey: env.DEEPGRAM_API_KEY }),
  agent: (env) => new MockAgentProcessor(env),
  speech: (env) => cartesia("sonic-3", { apiKey: env.CARTESIA_API_KEY }),
});

export default {
  async fetch(request: Request, env: Env) {
      const stub = env.VOICE_SESSION.get(id);
      return stub.fetch(new Request(request, { headers }));
    }
    return new Response("Not found", { status: 404 });
  },
};

export { VoiceSessionDO };

That's it really. I'm not kidding. You can now speak to your computer and get spoken back. :)

See Examples section to see how to run this code.

Packages

Package	Description
`@usevoiceai/core`	Framework-agnostic websocket client, voice recorder, controller that wires everything up, and state store.
`@usevoiceai/react`	`useVoice` and `useSpeech` hooks which are the main interfaces for capturing speech and playing the response speech on web clients.
`@usevoiceai/server`	Runtime-agnostic voice session, session adapters for transports such a Durable Objects websockets, Node websockets, etc., and STT/TTS/agent provider scaffolding.
`@usevoiceai/cartesia` / `@usevoiceai/deepgram`	Voice service providers such as transcription, speech generation, etc. Deepgram for transcription and Cartesia for speech generation is implemented out of box. More to come soon.
`@usevoiceai/hume`	Hume Text-to-Speech provider wired to the voice session pipeline so you can stream Hume voices over websockets.

Hume TTS quickstart

import { hume } from "@usevoiceai/hume";
import { deepgram } from "@usevoiceai/deepgram";
import { createVoiceDurableObject } from "@usevoiceai/server";

const session = createVoiceDurableObject({
  transcription: () => deepgram("nova-3"),
  agent: () => new MockAgentProcessor(),
  speech: () =>
    hume({
      apiKey: process.env.HUME_API_KEY,
      voice: { name: "Ava Song", provider: "HUME_AI" },
    }),
});

Examples live under examples/* so we can test React and Vue integrations against the same API surface.

Example Apps

This workspace uses Bun and it's workspaces feature for development.

React: cd examples/react && bun install && bun run dev
Cloudflare Worker (backend): cd examples/cloudflare-worker && bun install && bun run dev

To test an end-to-end voice session with a React app and a Cloudflare worker:

Install dependencies and configure the required secrets inside examples/cloudflare-worker

DEEPGRAM_API_KEY - find here
CARTESIA_API_KEY - find here
GOOGLE_GENERATIVE_AI_API_KEY(optional) - find here or replace the code with your own AgentProcessor implementation and return any response text you want.

All these services provide generous free credits and getting API keys is super simple.

Run bun run dev inside that folder to start the worker locally (defaults to http://127.0.0.1:8787).
Copy the websocket URL (ws://127.0.0.1:8787/voice-command/ws?userId=demo) into examples/react-demo/.env as VITE_USEVOICEAI_WS_URL.
Start the React demo (bun --filter @usevoiceai/example-react run dev).
Profit!

Scripts

bun run build – runs each package build.
bun run dev – runs package-level dev/watch scripts.
bun run test – executes unit tests in every workspace.
bun run lint – placeholder for linters.

Errors

Error events use a typed shape: type: "error", data: { code, message, retryable?, details? }. Client state mirrors this on status.error/status.errorCode. See docs/errors.md for the code list and handling examples.

Quick roadmap

Add support for local models
Add support for more transports
Add support for more voice service providers
Last but not the least, conventional commits ;D

Next Steps

Docs website and more guides coming soon.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.changeset		.changeset
docs		docs
examples		examples
packages		packages
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
llm.txt		llm.txt
package.json		package.json
tsconfig.base.json		tsconfig.base.json
vitest.workspace.ts		vitest.workspace.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

usevoiceai

The Typescript toolkit for ambitious voice AI apps

Tutorial

Client quickstart

Server quickstart

Packages

Hume TTS quickstart

Example Apps

Scripts

Errors

Quick roadmap

Next Steps

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

ankitiscracked/usevoiceai

Folders and files

Latest commit

History

Repository files navigation

usevoiceai

The Typescript toolkit for ambitious voice AI apps

Tutorial

Client quickstart

Server quickstart

Packages

Hume TTS quickstart

Example Apps

Scripts

Errors

Quick roadmap

Next Steps

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages