From 55530f11d3846c1aab29c108e00305a3ef6a485b Mon Sep 17 00:00:00 2001 From: Lewis Dwyer Date: Fri, 6 Feb 2026 13:28:40 +0200 Subject: [PATCH 1/8] adding draft updates for daily and replit integraion guides --- docs/daily.mdx | 499 ++++++++++++++++++++++++++++++++++++- docs/replit.mdx | 81 +++--- images/pipecat-demo.gif | Bin 0 -> 4232549 bytes images/replit-demo-app.mp4 | Bin 0 -> 5163095 bytes 4 files changed, 535 insertions(+), 45 deletions(-) create mode 100644 images/pipecat-demo.gif create mode 100644 images/replit-demo-app.mp4 diff --git a/docs/daily.mdx b/docs/daily.mdx index a60d307..6241308 100644 --- a/docs/daily.mdx +++ b/docs/daily.mdx @@ -4,13 +4,500 @@ title: Daily import { IntegrationHeader } from '/snippets/integration-header.mdx' - -[Daily](https://daily.co/) is the team behind Pipecat, empowering developers to build voice agents at scale using ultra low latency, open source SDKs and enterprise reliability. Building the future of voice, video, and real-time AI, Daily helps you imagine and create innovative communication experiences with infrastructure built on WebRTC. +This guide demonstrates how to build a real-time voice agent using [Pipecat](https://github.com/pipecat-ai/pipecat), Daily's open-source framework for building voice agents. Rime provides natural-sounding speech synthesis. -Rime's text-to-speech (TTS) synthesis model is available through the Daily API. With Daily's Rime integration and the Pipecat framework, you can develop responsive AI voice applications that deliver natural, lifelike interactions. +You can mix and match different services for each component of your Pipecat pipeline. This guide uses: +- `silero` for voice activity detection (VAD) +- `gpt-4o-transcribe` for speech-to-text (STT) +- `gpt-4o-mini` to generate responses +- `rime` for text-to-speech (TTS) -View our [Rime Pipecat demo agents](https://github.com/rimelabs/rime-pipecat-agents) for ready-to-use examples, from basic voice agents to multilingual agents that switch languages dynamically. For more details on the Pipecat framework, visit [Pipecat's documentation](https://docs.pipecat.ai/getting-started/introduction). \ No newline at end of file +By the end, you'll have a working voice agent that runs locally and opens in your browser. + +Demo of a voice agent conversation using Pipecat and Rime + +The following Pipecat terminology will help your understanding of the rest of the guide: +- A **Pipeline** is a sequence of frame processors. Audio frames flow in, get transcribed, processed by the LLM, synthesized to speech, and flow back out. +- A **Transport** handles real-time audio I/O. Pipecat supports multiple transports including WebRTC (browser), WebSocket, and local audio devices. +- **Frame processors** are the building blocks. Each service (STT, LLM, TTS) is a processor that transforms frames as they flow through the pipeline. + +If you'd like to experiment with Rime's TTS API directly before building a full voice agent, check out: [TTS in five minutes](/docs/quickstart-five-minute). + +## Step 1: Prerequisites + +Gather the following API keys and tools before starting. + +### 1.1 Rime API key + +Sign up for a [Rime account](https://app.rime.ai/signup/) and copy your API key from the [API Tokens](https://app.rime.ai/tokens/) page. This enables access to the Rime API for text-to-speech (TTS). + +### 1.2 OpenAI API key + +Create an [OpenAI account](https://platform.openai.com/signup) and generate an API key from the [API keys page](https://platform.openai.com/api-keys). This key enables speech-to-text and LLM responses. + +### 1.3 Python + +Install [Python 3.10 or later](https://www.python.org/downloads/). Verify your installation by running `python --version` in your terminal. + +## Step 2: Project setup + +Set up your project folder, environment variables, and dependencies. + +### 2.1 Create the project folder + +Create a new folder for your project and navigate into it: + +```bash +mkdir rime-pipecat-agent +cd rime-pipecat-agent +``` + +### 2.2 Set up environment variables + +In the new directory, create a file called `.env` and add the keys that you gathered in [Step 1](#step-1-prerequisites): + +``` +RIME_API_KEY=your_rime_api_key +OPENAI_API_KEY=your_openai_api_key +``` + +Replace the placeholder values with your actual API keys. + +### 2.3 Configure dependencies + +Install the `uv` package manager: + + +```bash macOS/Linux +curl -LsSf https://astral.sh/uv/install.sh | sh +``` + +```bash pip +pip install uv +``` + +```bash Homebrew (macOS) +brew install uv +``` + + +Create a file called `pyproject.toml` and add the following dependencies: + +```toml +[project] +name = "rime-pipecat-agent" +version = "0.1.0" +requires-python = ">=3.10" +dependencies = [ + "python-dotenv>=1.1.1", + "pipecat-ai[openai,rime,silero,webrtc,runner]>=0.0.100", + "pipecat-ai-small-webrtc-prebuilt>=2.0.4", +] +``` + +Pipecat uses a plugin system where each service integration is a separate package. The extras in brackets (`[openai,rime,silero,webrtc,runner]`) install these plugins: +- **openai**: STT and LLM services for transcription and generating responses +- **rime**: TTS service for synthesizing speech +- **silero**: VAD for detecting when the user starts and stops speaking +- **webrtc**: Transport for browser-based audio via WebRTC +- **runner**: Development runner that handles server setup and WebRTC connections + +The `pipecat-ai-small-webrtc-prebuilt` package provides a ready-to-use browser client that connects to your agent. + +Then, install the dependencies by running: + +```bash +uv sync +``` + +## Step 3: Create the agent + +Create a file called `agent.py` for all the code that gets your agent talking. If you're in a rush and just want to run it, skip to [Step 3.5: Full agent code](#3-5-full-agent-code). Otherwise, continue reading to code the agent step-by-step. + +### 3.1 Load environment variables and configure imports + +Add the following imports and initialization code to `agent.py`: + +```python +import os +from dotenv import load_dotenv + +from pipecat.pipeline.pipeline import Pipeline +from pipecat.pipeline.runner import PipelineRunner +from pipecat.pipeline.task import PipelineParams, PipelineTask +from pipecat.frames.frames import LLMMessagesAppendFrame +from pipecat.services.openai.stt import OpenAISTTService +from pipecat.services.openai.llm import OpenAILLMService +from pipecat.services.rime.tts import RimeNonJsonTTSService +from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext +from pipecat.audio.vad.silero import SileroVADAnalyzer +from pipecat.transports.base_transport import TransportParams +from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport +from pipecat.runner.run import main +from pipecat.runner.types import SmallWebRTCRunnerArguments + +load_dotenv() +``` + +Each import corresponds to a frame processor or utility: +- `Pipeline` chains processors together in sequence +- `PipelineRunner` manages the event loop and runs the pipeline +- `LLMMessagesAppendFrame` triggers the LLM to respond when queued +- Services like `OpenAISTTService`, `OpenAILLMService`, and `RimeNonJsonTTSService` are the frame processors that do the actual work +- `OpenAILLMContext` maintains conversation history across turns +- `SileroVADAnalyzer` detects speech boundaries so the agent knows when you've finished talking +- `SmallWebRTCTransport` handles peer-to-peer WebRTC connections for browser-based audio +- `SmallWebRTCRunnerArguments` provides connection details when a user connects + +### 3.2 Define the system prompt + +Add the following configuration below the imports: + +```python +SYSTEM_PROMPT = """You are a helpful voice assistant. +Keep your responses short and conversational - no more than 2-3 sentences. +Be friendly and natural.""" +``` + +The system prompt defines your agent's personality. It can be as simple or complex as you like. Later in the guide you'll see an example of a detailed system prompt that fully customizes the agent's behavior. + +### 3.3 Code the conversation pipeline + +Add the following `bot` function to `agent.py`: + +```python +async def bot(runner_args: SmallWebRTCRunnerArguments): +``` + +The Pipecat runner automatically discovers any function named `bot` in your module. When a user connects via WebRTC, the runner calls this function and passes connection details through `runner_args`. + +Inside the `bot` function, add the WebRTC transport configuration: + +```python + transport = SmallWebRTCTransport( + runner_args.webrtc_connection, + TransportParams( + audio_in_enabled=True, + audio_out_enabled=True, + vad_analyzer=SileroVADAnalyzer(), + ), + ) +``` + +This creates the WebRTC transport with audio input/output enabled and Silero VAD for detecting when the user starts and stops speaking. + +Next, add the AI services for transcription, response generation, and speech synthesis: + +```python + stt = OpenAISTTService( + api_key=os.getenv("OPENAI_API_KEY"), + model="gpt-4o-transcribe", + ) + + llm = OpenAILLMService( + api_key=os.getenv("OPENAI_API_KEY"), + model="gpt-4o-mini", + ) + + tts = RimeNonJsonTTSService( + api_key=os.getenv("RIME_API_KEY"), + voice_id="atrium", + model="arcana", + ) +``` + +These configure OpenAI for STT and LLM responses, and Rime's `arcana` model for TTS. + +Add the conversation context: + +```python + context = OpenAILLMContext( + messages=[{"role": "system", "content": SYSTEM_PROMPT}] + ) + context_aggregator = llm.create_context_aggregator(context) +``` + +This maintains the conversation history so the LLM can reference previous messages. + +Add the pipeline that connects all the components: + +```python + pipeline = Pipeline([ + transport.input(), + stt, + context_aggregator.user(), + llm, + tts, + transport.output(), + context_aggregator.assistant(), + ]) +``` + +Frames flow through processors in order: audio in → transcription → user context → LLM response → speech synthesis → audio out → assistant context. The context aggregator appears twice to capture both sides of the conversation. + +Finally, add the task runner and an event handler to greet the user: + +```python + task = PipelineTask( + pipeline, + params=PipelineParams(enable_metrics=True), + ) + + @transport.event_handler("on_client_connected") + async def on_client_connected(transport, client): + await task.queue_frames([LLMMessagesAppendFrame( + messages=[{"role": "system", "content": "Say hello and introduce yourself."}], + run_llm=True + )]) + + runner = PipelineRunner(handle_sigint=runner_args.handle_sigint) + await runner.run(task) +``` + +The `on_client_connected` event fires when a user connects. It appends a system message prompting the LLM to greet the user and triggers an immediate response with `run_llm=True`. + +### 3.4 Create the main entrypoint + +Add the following code at the bottom of `agent.py`: + +```python +if __name__ == "__main__": + main() +``` + +Pipecat's `main` helper from `pipecat.runner.run` automatically: +- Discovers the `bot` function in your module +- Starts a FastAPI server with WebRTC endpoints +- Serves a prebuilt browser client at `/client` +- Handles WebRTC connection setup and passes the connection to your `bot` function + +When you run the agent, Pipecat starts a local HTTP server. Open the browser client to connect via WebRTC. The server runs locally while API calls are made to OpenAI and Rime. + +### 3.5 Full agent code + +At this point, your `agent.py` should look like the complete example below: + + +```python +import os +from dotenv import load_dotenv + +from pipecat.pipeline.pipeline import Pipeline +from pipecat.pipeline.runner import PipelineRunner +from pipecat.pipeline.task import PipelineParams, PipelineTask +from pipecat.frames.frames import LLMMessagesAppendFrame +from pipecat.services.openai.stt import OpenAISTTService +from pipecat.services.openai.llm import OpenAILLMService +from pipecat.services.rime.tts import RimeNonJsonTTSService +from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext +from pipecat.audio.vad.silero import SileroVADAnalyzer +from pipecat.transports.base_transport import TransportParams +from pipecat.transports.smallwebrtc.transport import SmallWebRTCTransport +from pipecat.runner.run import main +from pipecat.runner.types import SmallWebRTCRunnerArguments + +load_dotenv() + +SYSTEM_PROMPT = """You are a helpful voice assistant. +Keep your responses short and conversational - no more than 2-3 sentences. +Be friendly and natural.""" + + +async def bot(runner_args: SmallWebRTCRunnerArguments): + transport = SmallWebRTCTransport( + runner_args.webrtc_connection, + TransportParams( + audio_in_enabled=True, + audio_out_enabled=True, + vad_analyzer=SileroVADAnalyzer(), + ), + ) + + stt = OpenAISTTService( + api_key=os.getenv("OPENAI_API_KEY"), + model="gpt-4o-transcribe", + ) + + llm = OpenAILLMService( + api_key=os.getenv("OPENAI_API_KEY"), + model="gpt-4o-mini", + ) + + tts = RimeNonJsonTTSService( + api_key=os.getenv("RIME_API_KEY"), + voice_id="atrium", + model="arcana", + ) + + context = OpenAILLMContext( + messages=[{"role": "system", "content": SYSTEM_PROMPT}] + ) + context_aggregator = llm.create_context_aggregator(context) + + pipeline = Pipeline([ + transport.input(), + stt, + context_aggregator.user(), + llm, + tts, + transport.output(), + context_aggregator.assistant(), + ]) + + task = PipelineTask( + pipeline, + params=PipelineParams(enable_metrics=True), + ) + + @transport.event_handler("on_client_connected") + async def on_client_connected(transport, client): + await task.queue_frames([LLMMessagesAppendFrame( + messages=[{"role": "system", "content": "Say hello and introduce yourself."}], + run_llm=True + )]) + + runner = PipelineRunner(handle_sigint=runner_args.handle_sigint) + await runner.run(task) + + +if __name__ == "__main__": + main() +``` + + +## Step 4: Test your agent + +The full pipeline is now ready to test out. You can run the agent from the terminal using `uv` and interact with it in your browser. + +### 4.1 Start the agent + +Start your agent by running: + +```bash +uv run agent.py +``` + +You'll see output indicating the server is starting. + +### 4.2 Connect to your agent + +Open a browser and navigate to `http://localhost:7860/client`. Allow microphone access when prompted. + +You can now talk to your agent using your microphone. + +## Step 5: Customize your agent + +Now that your agent is running, you can experiment with different voices and personalities. + +### 5.1 Change the voice + +Update the `tts` initialization in your `bot` function to try a different voice: + +```python +tts = RimeNonJsonTTSService( + api_key=os.getenv("RIME_API_KEY"), + voice_id="celest", + model="arcana", +) +``` + +Rime offers many voices with different personalities. See the full list on the [Voices](/docs/voices) page. + +### 5.2 Fine-tune agent personalities + +Create a new file called `personality.py` with the following content: + + +```python +SYSTEM_PROMPT = """ +CHARACTER: +You are Detective Marlowe, a world-weary noir detective from the 1940s who +somehow ended up as an AI assistant. You treat every question like it's a +case to be cracked and speak in dramatic, hard-boiled metaphors. + +PERSONALITY: +- Cynical but secretly caring underneath the tough exterior +- Treats mundane tasks like high-stakes mysteries +- References your "years on the force" and "cases that still haunt you" +- Suspicious of technology but grudgingly impressed by it +- Has strong opinions about coffee and rain + +SPEECH STYLE: +- Keep responses to 2-3 sentences maximum +- Use noir metaphors like "this code is messier than a speakeasy on a Saturday night" +- Dramatic pauses with "..." for effect +- Call the user "kid" or "pal" occasionally +- End with ominous or philosophical observations + +RESTRICTIONS: +- Never break character +- Don't use emojis or special characters +- Stay family-friendly despite the noir tone +""" + +INTRO_MESSAGE = "The name's Marlowe... I've seen things that would make your code freeze, pal. So what case are you bringing to my desk tonight?" +``` + + +Update your `agent.py` to import and use this prompt: + +```python +from personality import SYSTEM_PROMPT, INTRO_MESSAGE +``` + +Then update the `on_client_connected` handler to use your custom intro message: + +```python +@transport.event_handler("on_client_connected") +async def on_client_connected(transport, client): + await task.queue_frames([LLMMessagesAppendFrame( + messages=[{"role": "system", "content": f"Say: {INTRO_MESSAGE}"}], + run_llm=True + )]) +``` + +Storing your system prompt in a separate file keeps your personality configuration separate from your agent logic, making it easy to experiment with different characters. + +## Next steps + +Pipecat's modular design makes it easy to swap components. You can: +- Replace OpenAI with another STT provider (Deepgram, AssemblyAI, etc.) +- Use a different LLM (Anthropic, Gemini, local models) +- Switch transports. Use WebSocket for server-to-server, or Daily's hosted rooms for production deployments + +For more details on the Pipecat framework, including transport options, deployment patterns, and advanced features, visit [Pipecat's documentation](https://docs.pipecat.ai/getting-started/introduction). + +View our [Rime Pipecat demo agents](https://github.com/rimelabs/rime-pipecat-agents) for a ready-to-use multilingual agent example that switches languages dynamically. + +## Troubleshooting + +If something is not behaving as expected, check out the quick fixes below. + +### No audio output / TTS errors + +- **Check your TTS service class:** The `arcana` model requires `RimeNonJsonTTSService`. If you see WebSocket HTTP 400 errors in the logs, you may be using `RimeTTSService` which is only compatible with models like `mistv2`. +- **Verify your Rime API key:** Ensure the key is valid and has TTS permissions. + +### Agent doesn't respond to speech + +- **Check microphone permissions:** Ensure your browser has microphone access enabled. +- **Verify VAD is working:** Look for logs indicating speech detection. If missing, check your Silero installation. +- **Test audio input:** Use a different microphone or headset. + +### "API key not set" errors + +- **Check environment variables:** Ensure all keys in `.env` are set correctly with no extra spaces. +- **Verify the `.env` file location:** The file should be in the same directory as `agent.py`. + +### Audio quality issues + +- **Check your microphone:** Test with a different input device or headset. +- **Reduce background noise:** The VAD may struggle to detect speech in noisy environments. diff --git a/docs/replit.mdx b/docs/replit.mdx index 952c5ac..7c11ae0 100644 --- a/docs/replit.mdx +++ b/docs/replit.mdx @@ -4,82 +4,85 @@ title: Replit import { IntegrationHeader } from '/snippets/integration-header.mdx' - -[Replit](https://replit.com/) is an AI-powered platform that lets you create and publish apps from a single browser tab. No local setup, no environment headaches. Just describe what you want to build, and Replit's AI agent handles the heavy lifting. +This guide shows how to add text-to-speech to apps built with [Replit](https://replit.com/), an AI-powered platform where you describe what you want and an AI agent writes the code. Rime is a text-to-speech API that generates natural-sounding voice audio. When your Replit app needs to speak, it sends text to Rime's API and receives audio back. -With Rime's text-to-speech integration, you can add natural, lifelike voices to your Replit apps in minutes. Whether you're building a voiceover tool, an accessibility feature, or an AI assistant, Rime + Replit makes it surprisingly simple. +