diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..2dbb6ca --- /dev/null +++ b/PLAN.md @@ -0,0 +1,170 @@ +# Grok Voice CLI Implementation Plan + +## Goal +Create a CLI that connects to Grok's realtime voice API via WebSocket, captures microphone audio, sends it to Grok, and plays the AI's audio responses through speakers. + +## Research Summary + +### XAI Realtime Voice API (from xai-cookbook) +- **WebSocket URL**: `wss://api.x.ai/v1/realtime` +- **Authentication**: `Authorization: Bearer ${XAI_API_KEY}` header on WebSocket connect +- **Audio Formats**: + - PCM 16-bit: `audio/pcm` with configurable sample rate (8kHz-48kHz, default 24kHz) + - μ-law: `audio/pcmu` (for telephony) +- **Audio Encoding**: Base64 encoded in JSON messages + +### WebSocket Protocol +1. On connect, receive `conversation.created` event +2. Send `session.update` to configure voice, audio format, VAD, instructions +3. Wait for `session.updated` confirmation +4. Send audio via `input_audio_buffer.append` with base64 audio +5. Receive `response.output_audio.delta` with base64 audio chunks +6. Server VAD detects speech end, triggers `response.create` + +### Message Types +```typescript +// Outbound +{ type: "session.update", session: { voice, audio: { input/output: { format: { type, rate } } }, turn_detection, instructions } } +{ type: "input_audio_buffer.append", audio: "" } +{ type: "input_audio_buffer.commit" } +{ type: "conversation.item.create", item: { type: "message", role, content } } +{ type: "response.create" } + +// Inbound +{ type: "conversation.created" } +{ type: "session.updated" } +{ type: "response.created" } +{ type: "response.output_audio.delta", delta: "" } +{ type: "response.output_audio_transcript.delta", delta: "" } +{ type: "conversation.item.input_audio_transcription.completed", transcript: "" } +{ type: "input_audio_buffer.speech_started" } +{ type: "response.done" } +{ type: "error", error: { message } } +``` + +## Architecture + +### Services +1. **GrokVoiceClient** - WebSocket connection to XAI API + - Connect with auth header + - Send/receive JSON messages + - Handle session lifecycle + - Emit audio events as Effect Stream + +2. **AudioCapture** - Microphone input + - Use `sox` CLI for cross-platform compatibility (requires brew install sox) + - Capture PCM 16-bit mono at 24kHz + - Stream audio chunks as Buffers + +3. **AudioPlayback** - Speaker output + - Use `sox` CLI (play command) for playback + - Accept PCM 16-bit mono at 24kHz stream + - Buffer and play audio chunks + +4. **VoiceSession** - Orchestrates the voice chat + - Coordinates capture → Grok → playback + - Handles VAD events (speech start/end) + - Logs transcripts + +### File Structure +``` +src/voice/ + cli.ts # CLI command definition + client.ts # GrokVoiceClient service + audio-capture.ts # Microphone capture via sox + audio-playback.ts # Speaker playback via sox + domain.ts # Voice-specific types + index.ts # Export barrel +``` + +## Implementation Steps + +### Step 1: Dependencies +Add to package.json: +- `ws` - WebSocket client (standard, Bun compatible) + +No native audio packages needed - use sox CLI which is more reliable. + +### Step 2: Domain Types (domain.ts) +```typescript +export const VoiceConfig = Schema.Struct({ + voice: Schema.optional(Schema.String), + sampleRate: Schema.optional(Schema.Number), + instructions: Schema.optional(Schema.String) +}) + +export type XaiMessage = + | { type: "session.update"; session: SessionConfig } + | { type: "input_audio_buffer.append"; audio: string } + | { type: "response.output_audio.delta"; delta: string } + // ... etc +``` + +### Step 3: GrokVoiceClient (client.ts) +Effect.Service that: +- Creates WebSocket connection +- Handles authentication +- Provides `send(message)` and `receive` Stream +- Manages session configuration + +### Step 4: AudioCapture (audio-capture.ts) +Effect.Service that: +- Spawns `sox -d -t raw -r 24000 -e signed -b 16 -c 1 -` +- Streams stdout as audio chunks +- Chunks into ~50ms frames for WebSocket sending + +### Step 5: AudioPlayback (audio-playback.ts) +Effect.Service that: +- Spawns `sox -t raw -r 24000 -e signed -b 16 -c 1 - -d` +- Writes audio chunks to stdin +- Handles buffering for smooth playback + +### Step 6: Voice CLI (cli.ts) +Command that: +- Accepts --voice, --instructions options +- Reads XAI_API_KEY from env +- Starts capture/playback +- Connects to Grok +- Runs until Ctrl+C + +### Step 7: Integration +- Add `voiceCommand` to commands.ts subcommands +- Test with `bun run mini-agent voice` + +## Audio Format Details +- Sample rate: 24000 Hz +- Bit depth: 16-bit signed +- Channels: 1 (mono) +- Encoding: PCM (linear) +- Chunk size: ~2048 bytes (512 samples, ~21ms) +- WebSocket transport: Base64 encoded JSON + +## Progress Tracking +- [x] Research XAI voice API (from xai-cookbook) +- [x] Research audio handling approaches +- [x] Design architecture +- [x] Implement domain types +- [x] Implement GrokVoiceClient +- [x] Implement AudioCapture +- [x] Implement AudioPlayback +- [x] Implement CLI command +- [x] Wire up to main CLI +- [ ] Test end-to-end + +## Usage + +```bash +# Voice mode (requires sox: brew install sox) +bun run mini-agent voice --voice ara + +# Text mode (type messages instead of speaking) +bun run mini-agent voice --text + +# With custom instructions +bun run mini-agent voice --instructions "You are a pirate. Respond in pirate speak." + +# Help +bun run mini-agent voice --help +``` + +Requires `XAI_API_KEY` environment variable to be set. diff --git a/bun.lock b/bun.lock index df96ba0..ff78087 100644 --- a/bun.lock +++ b/bun.lock @@ -22,6 +22,7 @@ "effect": "^3.19.8", "react": "19", "react-dom": "19", + "ws": "^8.19.0", "yaml": "^2.7.0", }, "devDependencies": { @@ -33,6 +34,7 @@ "@eslint/js": "^9.10.0", "@types/bun": "latest", "@types/react": "19", + "@types/ws": "^8.18.1", "@typescript-eslint/eslint-plugin": "^8.4.0", "@typescript-eslint/parser": "^8.4.0", "eslint": "^9.10.0", @@ -416,6 +418,8 @@ "@types/react": ["@types/react@19.2.7", "", { "dependencies": { "csstype": "^3.2.2" } }, "sha512-MWtvHrGZLFttgeEj28VXHxpmwYbor/ATPYbBfSFZEIRK0ecCFLl2Qo55z52Hss+UV9CRN7trSeq1zbgx7YDWWg=="], + "@types/ws": ["@types/ws@8.18.1", "", { "dependencies": { "@types/node": "*" } }, "sha512-ThVF6DCVhA8kUGy+aazFQ4kXQ7E1Ty7A3ypFOe0IcJV8O/M511G99AW24irKrW56Wt44yG9+ij8FaqoBGkuBXg=="], + "@typescript-eslint/eslint-plugin": ["@typescript-eslint/eslint-plugin@8.48.1", "", { "dependencies": { "@eslint-community/regexpp": "^4.10.0", "@typescript-eslint/scope-manager": "8.48.1", "@typescript-eslint/type-utils": "8.48.1", "@typescript-eslint/utils": "8.48.1", "@typescript-eslint/visitor-keys": "8.48.1", "graphemer": "^1.4.0", "ignore": "^7.0.0", "natural-compare": "^1.4.0", "ts-api-utils": "^2.1.0" }, "peerDependencies": { "@typescript-eslint/parser": "^8.48.1", "eslint": "^8.57.0 || ^9.0.0", "typescript": ">=4.8.4 <6.0.0" } }, "sha512-X63hI1bxl5ohelzr0LY5coufyl0LJNthld+abwxpCoo6Gq+hSqhKwci7MUWkXo67mzgUK6YFByhmaHmUcuBJmA=="], "@typescript-eslint/parser": ["@typescript-eslint/parser@8.48.1", "", { "dependencies": { "@typescript-eslint/scope-manager": "8.48.1", "@typescript-eslint/types": "8.48.1", "@typescript-eslint/typescript-estree": "8.48.1", "@typescript-eslint/visitor-keys": "8.48.1", "debug": "^4.3.4" }, "peerDependencies": { "eslint": "^8.57.0 || ^9.0.0", "typescript": ">=4.8.4 <6.0.0" } }, "sha512-PC0PDZfJg8sP7cmKe6L3QIL8GZwU5aRvUFedqSIpw3B+QjRSUZeeITC2M5XKeMXEzL6wccN196iy3JLwKNvDVA=="], @@ -1116,7 +1120,7 @@ "word-wrap": ["word-wrap@1.2.5", "", {}, "sha512-BN22B5eaMMI9UMtjrGd5g5eCYPpCPDUy0FJXbYsaT5zYxjFOckS53SQDE3pWkVoWpHXVb3BrYcEN4Twa55B5cA=="], - "ws": ["ws@8.18.3", "", { "peerDependencies": { "bufferutil": "^4.0.1", "utf-8-validate": ">=5.0.2" }, "optionalPeers": ["bufferutil", "utf-8-validate"] }, "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg=="], + "ws": ["ws@8.19.0", "", { "peerDependencies": { "bufferutil": "^4.0.1", "utf-8-validate": ">=5.0.2" }, "optionalPeers": ["bufferutil", "utf-8-validate"] }, "sha512-blAT2mjOEIi0ZzruJfIhb3nps74PRWTCz1IjglWEEpQl5XS/UNama6u2/rjFkDDouqr4L67ry+1aGIALViWjDg=="], "xml-parse-from-string": ["xml-parse-from-string@1.0.1", "", {}, "sha512-ErcKwJTF54uRzzNMXq2X5sMIy88zJvfN2DmdoQvy7PAFJ+tPRU6ydWuOKNMyfmOjdyBQTFREi60s0Y0SyI0G0g=="], @@ -1134,6 +1138,8 @@ "@anthropic-ai/tokenizer/@types/node": ["@types/node@18.19.130", "", { "dependencies": { "undici-types": "~5.26.4" } }, "sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg=="], + "@effect/platform-node-shared/ws": ["ws@8.18.3", "", { "peerDependencies": { "bufferutil": "^4.0.1", "utf-8-validate": ">=5.0.2" }, "optionalPeers": ["bufferutil", "utf-8-validate"] }, "sha512-PEIGCY5tSlUt50cqyMXfCzX+oOPqN0vuGqWzbcJ2xvnkzkq46oOpz7dQaTDBdfICb4N14+GARUDw2XV2N4tvzg=="], + "@effect/rpc-http/@effect/rpc": ["@effect/rpc@0.54.4", "", { "peerDependencies": { "@effect/platform": "^0.79.4", "effect": "^3.13.12" } }, "sha512-iu3TGWCt4OMH8iKL1ATeROhAxrMF+HdF3NbR5lWls9yWJwBgVU+cps3ZzRbNQhFPWXDGqVuYgmYNY1GKbZgMaw=="], "@eslint-community/eslint-utils/eslint-visitor-keys": ["eslint-visitor-keys@3.4.3", "", {}, "sha512-wpc+LXeiyiisxPlEkUzU6svyS1frIO3Mgxj1fdy7Pm8Ygzguax2N3Fa/D/ag1WqbOprdI+uY6wMUl8/a2G+iag=="], diff --git a/package.json b/package.json index 1664ef1..1b07e4a 100644 --- a/package.json +++ b/package.json @@ -25,6 +25,7 @@ "@eslint/js": "^9.10.0", "@types/bun": "latest", "@types/react": "19", + "@types/ws": "^8.18.1", "@typescript-eslint/eslint-plugin": "^8.4.0", "@typescript-eslint/parser": "^8.4.0", "eslint": "^9.10.0", @@ -57,6 +58,7 @@ "effect": "^3.19.8", "react": "19", "react-dom": "19", + "ws": "^8.19.0", "yaml": "^2.7.0" } } diff --git a/scratch.md b/scratch.md new file mode 100644 index 0000000..a6cc306 --- /dev/null +++ b/scratch.md @@ -0,0 +1,394 @@ +# Unified LLM Abstraction: Consolidating Voice + Text + +## The Core Insight + +Both HTTP-based chat completions and WebSocket-based voice APIs fundamentally do the same thing: +**Turn conversation context into assistant responses as a stream of events.** + +The differences are: +- **Transport**: HTTP request/response vs WebSocket bidirectional +- **State location**: Client-held (HTTP) vs Server-held (WebSocket) +- **Modality**: Text only vs Text + Audio +- **Granularity**: Per-turn vs Per-chunk + +## Current Architecture Analysis + +### HTTP Path (openai-chat-completions-client.ts) +``` +LanguageModel.streamText({ prompt }) + → Stream + → text-delta, tool-call, finish, etc. +``` + +### Voice Path (voice/client.ts) +``` +GrokVoiceClient.connect(config) + → GrokVoiceConnection { + send: audio → Effect + sendText: text → Effect + audioOutput: Stream + transcripts: Stream + events: Stream + } +``` + +### Agent Loop (llm-turn.ts) +``` +MiniAgentTurn.execute(ctx: ReducedContext) + → Stream + → TextDeltaEvent, AssistantMessageEvent +``` + +## The Gap + +The voice client exposes raw streams (audio buffers, transcripts) but doesn't: +1. Implement `LanguageModel` interface +2. Emit domain events (`ContextEvent`) +3. Track conversation turns +4. Support tool calling flow + +## Four Proposed Approaches + +### Proposal 1: Event-Driven Session + +Everything is an event. Both transports emit unified `SessionEvent` types. + +```typescript +type SessionEvent = + | TextInputEvent | AudioInputEvent + | TextOutputEvent | AudioOutputEvent + | ToolCallEvent | ToolResponseEvent +``` + +**Pros**: Clean abstraction, type-safe events, natural for streaming +**Cons**: Forces event model on HTTP, memory growth, semantic mismatch + +### Proposal 2: Dual-Mode Interface + +Separate interfaces for each pattern. Don't force unification. + +```typescript +interface RequestResponseLlm { + complete: (req) => Effect> +} + +interface StreamingSessionLlm { + connect: Effect<{ send: Sink, receive: Stream }> +} +``` + +**Pros**: Semantic clarity, efficient per-transport +**Cons**: Code duplication, can't swap transport + +### Proposal 3: Layered Abstraction (RECOMMENDED) + +High-level session wraps low-level transport. Session handles state for stateless transports. + +```typescript +// Agent interacts with this +class ConversationSession { + sendText: (text) => Effect + sendAudio: (chunk) => Effect + sendToolResult: (id, result) => Effect + events: Stream +} + +// Transport implementations +interface LlmTransport { + mode: "stateless" | "stateful" + send?: (context) => Effect> + connect?: Effect<{ input: Sink, output: Stream }> +} +``` + +**Pros**: Unified agent code, transport-optimized implementations, swappable +**Cons**: Extra abstraction layer, potential feature leakage + +### Proposal 4: Multi-Modal Stream Processor + +Pure stream processing - everything in, everything out. + +```typescript +process(input: { + text?: Stream, + audio?: Stream, + tools?: Stream +}) => Effect<{ + text: Stream, + audio: Stream, + toolCalls: Stream +}> +``` + +**Pros**: Stream-native, composable, backpressure-aware +**Cons**: HTTP buffering awkward, complex debugging + +## Recommended Design: Proposal 3 + +### Core Types + +```typescript +// Unified conversation events +type ConversationEvent = + | { _tag: "TextDelta"; delta: string } + | { _tag: "TextComplete"; content: string } + | { _tag: "AudioDelta"; chunk: Buffer } + | { _tag: "AudioComplete" } + | { _tag: "ToolCall"; id: string; name: string; params: unknown } + | { _tag: "TurnComplete"; usage?: Usage } + | { _tag: "Error"; error: Error } + +// Conversation state +interface ConversationContext { + messages: Prompt.Message[] + config: LlmConfig + turnNumber: number +} + +// Transport interface +interface LlmTransport { + readonly _tag: "stateless" | "stateful" +} + +interface StatelessTransport extends LlmTransport { + _tag: "stateless" + complete: (ctx: ConversationContext, input: TurnInput) + => Effect> +} + +interface StatefulTransport extends LlmTransport { + _tag: "stateful" + connect: Effect +} + +interface StatefulConnection { + sendText: (text: string) => Effect + sendAudio: (chunk: Buffer) => Effect + sendToolResult: (id: string, result: unknown) => Effect + events: Stream + close: Effect +} +``` + +### Session Layer + +```typescript +class UnifiedSession extends Effect.Service()( + "@app/UnifiedSession", + { + effect: Effect.gen(function*() { + const transport = yield* LlmTransport + const state = yield* Ref.make(initialContext) + const eventQueue = yield* Queue.unbounded() + + // For stateful: establish connection once + const connectionRef = transport._tag === "stateful" + ? yield* transport.connect.pipe(Effect.map(Option.some), Ref.make) + : yield* Ref.make>(Option.none()) + + return { + sendText: (text: string) => Effect.gen(function*() { + if (transport._tag === "stateless") { + // Add to context, send full context + yield* state.update((ctx) => ({ + ...ctx, + messages: [...ctx.messages, Prompt.userMessage(text)] + })) + const ctx = yield* state.get + const stream = yield* transport.complete(ctx, { type: "text", text }) + yield* stream.pipe( + Stream.runForEach((event) => Queue.offer(eventQueue, event)) + ) + } else { + // Send text directly + const conn = yield* connectionRef.get.pipe(Effect.flatMap(Option.getOrThrow)) + yield* conn.sendText(text) + } + }), + + sendAudio: (chunk: Buffer) => Effect.gen(function*() { + if (transport._tag === "stateless") { + // Buffer audio, send when turn complete (or batch) + yield* Effect.fail(new Error("Audio batching not implemented")) + } else { + const conn = yield* connectionRef.get.pipe(Effect.flatMap(Option.getOrThrow)) + yield* conn.sendAudio(chunk) + } + }), + + events: Stream.fromQueue(eventQueue) + } + }) + } +) {} +``` + +### Transport Implementations + +**HTTP Transport (adapts existing OpenAiChatClient):** + +```typescript +class HttpLlmTransport extends Effect.Service()( + "@app/HttpLlmTransport", + { + effect: Effect.gen(function*() { + const client = yield* OpenAiChatClient + + return LlmTransport.of({ + _tag: "stateless", + complete: (ctx, input) => Effect.gen(function*() { + const request = buildChatRequest(ctx, input) + return client.createChatCompletionStream(request).pipe( + Stream.mapEffect(translateToConversationEvent) + ) + }) + }) + }) + } +) {} +``` + +**WebSocket Transport (adapts GrokVoiceClient):** + +```typescript +class WebSocketLlmTransport extends Effect.Service()( + "@app/WebSocketLlmTransport", + { + effect: Effect.gen(function*() { + const voice = yield* GrokVoiceClient + + return LlmTransport.of({ + _tag: "stateful", + connect: Effect.gen(function*() { + const config = yield* VoiceConfig + const conn = yield* voice.connect(config) + yield* conn.waitForReady + + // Merge all voice streams into ConversationEvent + const events = Stream.mergeAll([ + conn.transcripts.pipe( + Stream.map((delta) => ({ _tag: "TextDelta" as const, delta })) + ), + conn.audioOutput.pipe( + Stream.map((chunk) => ({ _tag: "AudioDelta" as const, chunk })) + ), + conn.events.pipe( + Stream.filter(isToolCallEvent), + Stream.map(translateToolCallEvent) + ) + ]) + + return { + sendText: conn.sendText, + sendAudio: conn.send, + sendToolResult: (id, result) => conn.sendText(JSON.stringify({ id, result })), + events, + close: conn.close + } + }) + }) + }) + } +) {} +``` + +## Minimal Proof of Concept Plan + +### Phase 1: Core Types (src/unified/domain.ts) +- `ConversationEvent` union type +- `LlmTransport` interface (stateless | stateful) +- `ConversationContext` state type + +### Phase 2: HTTP Adapter (src/unified/http-transport.ts) +- Wrap `OpenAiChatClient` or direct to LanguageModel +- Translate `Response.StreamPartEncoded` → `ConversationEvent` +- Implement `StatelessTransport` + +### Phase 3: WebSocket Adapter (src/unified/ws-transport.ts) +- Wrap `GrokVoiceClient` +- Merge voice streams into `ConversationEvent` +- Implement `StatefulTransport` + +### Phase 4: Unified Session (src/unified/session.ts) +- `UnifiedSession` service +- State management for stateless transport +- Event routing for stateful transport + +### Phase 5: Demo CLI (src/unified/demo.ts) +- Simple REPL that: + - Accepts `--mode=http` or `--mode=voice` + - Sends text via `session.sendText()` + - Handles events via `session.events` + - For voice: sends audio from mic, plays audio output + +## Key Implementation Details + +### Event Translation + +```typescript +// HTTP Response.StreamPartEncoded → ConversationEvent +const translateHttpPart = (part: Response.StreamPartEncoded): Option => + match(part) + .with({ type: "text-delta" }, (p) => Option.some({ _tag: "TextDelta", delta: p.delta })) + .with({ type: "tool-call" }, (p) => Option.some({ _tag: "ToolCall", id: p.id, name: p.name, params: p.params })) + .with({ type: "finish" }, (p) => Option.some({ _tag: "TurnComplete", usage: p.usage })) + .otherwise(() => Option.none()) + +// Voice events → ConversationEvent +const translateVoiceEvent = (event: unknown): Option => + match(event) + .with({ type: "response.output_audio_transcript.delta" }, (e) => + Option.some({ _tag: "TextDelta", delta: e.delta })) + .with({ type: "response.done" }, () => + Option.some({ _tag: "TurnComplete" })) + .otherwise(() => Option.none()) +``` + +### Tool Calling Flow + +For voice mode, tool calling would need: +1. Parse tool call from transcript or dedicated event +2. Execute tool +3. Send result via `sendToolResult()` or `sendText()` +4. Server continues response + +This is where voice differs - currently Grok voice doesn't have native tool calling, so we'd need to: +- Detect tool call patterns in text +- Execute tools client-side +- Inject results as user messages + +### Audio I/O Integration + +The demo needs to handle: +- **Mic capture**: `AudioCapture.stream` → `session.sendAudio(chunk)` +- **Speaker playback**: `session.events.filter(isAudioDelta)` → `AudioPlayback.play(chunk)` +- **PTT vs VAD**: Push-to-talk (manual) or voice activity detection (server-side for Grok) + +## File Structure + +``` +src/unified/ + domain.ts # Core types + http-transport.ts # HTTP/stateless adapter + ws-transport.ts # WebSocket/stateful adapter + session.ts # UnifiedSession service + demo.ts # Demo CLI + index.ts # Exports +``` + +## Open Questions + +1. **Tool calling in voice mode**: How to handle? Text patterns? Dedicated event type? +2. **Audio batching for HTTP**: Some APIs support audio input - batch or not supported? +3. **Interruption**: WebSocket can interrupt mid-response - how to surface? +4. **Config switching**: Can you change model/voice mid-session? +5. **Error recovery**: HTTP retries vs WebSocket reconnect? + +## Next Steps + +1. Create `src/unified/` directory structure +2. Define core types in `domain.ts` +3. Implement HTTP transport first (simpler, can test with existing chat) +4. Implement WebSocket transport (build on existing voice client) +5. Build demo CLI that works in both modes +6. Test with real APIs diff --git a/src/cli/commands.ts b/src/cli/commands.ts index 842228f..07be3e6 100644 --- a/src/cli/commands.ts +++ b/src/cli/commands.ts @@ -22,6 +22,7 @@ import { EventStore } from "../event-store.ts" import { makeRouter } from "../http-routes.ts" import { layercodeCommand } from "../layercode/index.ts" import { printTraceLinks } from "../tracing.ts" +import { voiceCommand } from "../voice/index.ts" const encodeEvent = Schema.encodeSync(ContextEvent) @@ -620,6 +621,7 @@ const rootCommand = Command.make( chatCommand, serveCommand, layercodeCommand, + voiceCommand, logTestCommand, traceTestCommand, clearCommand diff --git a/src/unified/demo.ts b/src/unified/demo.ts new file mode 100644 index 0000000..0d6a950 --- /dev/null +++ b/src/unified/demo.ts @@ -0,0 +1,375 @@ +#!/usr/bin/env bun +/** + * Unified Session Demo CLI + * + * Demonstrates the unified LLM abstraction working with both: + * - HTTP transport (OpenAI-compatible chat completions) + * - WebSocket transport (Grok voice API) + * + * Logs all events to YAML file on exit (Ctrl+C). + * + * Usage: + * doppler run -- bun run src/unified/demo.ts --mode=http --provider=xai + * doppler run -- bun run src/unified/demo.ts --mode=voice + */ +import * as fs from "fs" +import * as path from "path" + +import { FetchHttpClient } from "@effect/platform" +import { BunCommandExecutor, BunFileSystem } from "@effect/platform-bun" +import { Console, Effect, Layer, Redacted, Stream } from "effect" +import * as readline from "readline" +import * as yaml from "yaml" + +import { OpenAiChatClient } from "../openai-chat-completions-client.ts" +import { AudioCapture } from "../voice/audio-capture.ts" +import { AudioPlayback } from "../voice/audio-playback.ts" +import { GrokVoiceClient } from "../voice/client.ts" +import { type ConversationEvent, makeUnifiedSession, type ToolDefinition, type ToolHandler } from "./domain.ts" +import { HttpTransportLive } from "./http-transport.ts" +import { WsTransportLive } from "./ws-transport.ts" + +// Define the get_secret tool +const tools: ReadonlyArray = [ + { + name: "get_secret", + description: "Returns a secret message about the future of technology", + parameters: { + type: "object", + properties: {}, + required: [] + } + } +] + +// Tool handlers - same implementation for both HTTP and WS +const toolHandlers: Record = { + get_secret: () => Effect.succeed("The Singularity is near") +} + +// Event log for YAML dump on exit - raw WebSocket/HTTP messages +const eventLog: Array<{ timestamp: string; direction: "recv" | "send"; message: unknown }> = [] + +/** + * Truncate delta fields in an object to avoid huge logs + */ +const truncateDeltas = (obj: unknown): unknown => { + if (obj === null || typeof obj !== "object") return obj + if (Array.isArray(obj)) return obj.map(truncateDeltas) + + const result: Record = {} + for (const [key, value] of Object.entries(obj as Record)) { + if (key === "delta" && typeof value === "string" && value.length > 100) { + result[key] = `` + } else if (typeof value === "object" && value !== null) { + result[key] = truncateDeltas(value) + } else { + result[key] = value + } + } + return result +} + +/** + * Log a raw message (incoming or outgoing) + */ +const logRawMessage = (direction: "recv" | "send", message: unknown): void => { + eventLog.push({ + timestamp: new Date().toISOString(), + direction, + message: truncateDeltas(message) + }) +} + +/** + * Write event log to YAML file + */ +const writeEventLog = (): void => { + if (eventLog.length === 0) return + + const timestamp = new Date().toISOString().replace(/[:.]/g, "-") + const filename = `unified-demo-${mode}-${timestamp}.yaml` + const filepath = path.join(process.cwd(), filename) + + const logData = { + meta: { + mode, + provider: providerName, + model, + startTime: eventLog[0]?.timestamp, + endTime: eventLog[eventLog.length - 1]?.timestamp, + eventCount: eventLog.length + }, + events: eventLog + } + + fs.writeFileSync(filepath, yaml.stringify(logData)) + // eslint-disable-next-line no-console + console.log(`\nEvent log written to: ${filepath}`) +} + +// Provider configurations for OpenAI-compatible APIs +// Note: Anthropic uses a different API format and needs a separate transport +const PROVIDERS: Record = { + xai: { apiUrl: "https://api.x.ai/v1", apiKeyEnv: "XAI_API_KEY", defaultModel: "grok-2-latest" }, + openai: { apiUrl: "https://api.openai.com/v1", apiKeyEnv: "OPENAI_API_KEY", defaultModel: "gpt-4o-mini" }, + groq: { + apiUrl: "https://api.groq.com/openai/v1", + apiKeyEnv: "GROQ_API_KEY", + defaultModel: "llama-3.3-70b-versatile" + }, + cerebras: { apiUrl: "https://api.cerebras.ai/v1", apiKeyEnv: "CEREBRAS_API_KEY", defaultModel: "llama-3.3-70b" }, + openrouter: { + apiUrl: "https://openrouter.ai/api/v1", + apiKeyEnv: "OPENROUTER_API_KEY", + defaultModel: "anthropic/claude-sonnet-4" + } +} + +// Parse CLI args +const args = process.argv.slice(2) +const modeArg = args.find((a) => a.startsWith("--mode=")) +const mode = modeArg?.split("=")[1] ?? "http" +const providerArg = args.find((a) => a.startsWith("--provider=")) +const providerName = providerArg?.split("=")[1] ?? "openrouter" +const modelArg = args.find((a) => a.startsWith("--model=")) + +const provider = PROVIDERS[providerName] +if (!provider) { + // eslint-disable-next-line no-console + console.error(`Unknown provider: ${providerName}. Available: ${Object.keys(PROVIDERS).join(", ")}`) + process.exit(1) +} +const model = modelArg?.split("=")[1] ?? provider.defaultModel +const apiKey = process.env[provider.apiKeyEnv] ?? "" + +if (mode !== "http" && mode !== "voice") { + // eslint-disable-next-line no-console + console.error( + "Usage: bun run src/unified/demo.ts --mode=http|voice [--provider=xai|openai|groq|cerebras] [--model=MODEL]" + ) + process.exit(1) +} + +// Register exit handler to dump event log +// exit fires on all exit paths (including SIGINT, uncaughtException) +process.on("exit", () => { + writeEventLog() +}) + +process.on("SIGINT", () => { + process.exit(0) +}) + +/* eslint-disable no-console */ +/** + * Handle conversation events - renders text, plays audio, handles tool calls + */ +const handleEvent = (event: ConversationEvent): Effect.Effect => + Effect.sync(() => { + switch (event._tag) { + case "TextDelta": + process.stdout.write(event.delta) + break + case "TextComplete": + console.log(`\n[Complete] ${event.content}`) + break + case "AudioDelta": + // Audio is handled separately via stream in voice mode + break + case "AudioComplete": + console.log("[Audio complete]") + break + case "ToolCall": + console.log(`\n[Tool Call] ${event.name}(${JSON.stringify(event.params)})`) + break + case "TurnComplete": + console.log("\n[Turn complete]") + if (event.inputTokens || event.outputTokens) { + console.log( + ` Tokens: ${event.inputTokens ?? "?"} in / ${event.outputTokens ?? "?"} out` + ) + } + break + case "UserTranscript": + console.log(`\n[You said] ${event.transcript}`) + break + case "ConversationError": + console.error(`[Error] ${event.message}`) + break + case "ToolResult": + console.log(`[Tool Result] ${event.id}`) + break + case "SessionReady": + console.log("[Session ready]") + break + case "SpeechStarted": + console.log("[Speech started]") + break + case "SpeechStopped": + console.log("[Speech stopped]") + break + case "ResponseStarted": + console.log("[Response started]") + break + case "RawEvent": + // Log raw WebSocket message to YAML (truncates delta fields) + logRawMessage("recv", event.data) + break + } + }) +/* eslint-enable no-console */ + +/** + * Simple readline prompt + */ +const prompt = (rl: readline.Interface): Effect.Effect => + Effect.async((resume) => { + rl.question("\nYou: ", (answer) => { + resume(Effect.succeed(answer)) + }) + }) + +/** + * HTTP mode: simple text REPL + */ +const httpDemo = Effect.gen(function*() { + yield* Console.log("=== Unified Session Demo (HTTP Mode) ===") + yield* Console.log(`Provider: ${providerName} | Model: ${model}`) + yield* Console.log("Tools: get_secret (returns a message about the future)") + yield* Console.log("Type a message and press Enter. Type 'quit' to exit.\n") + + const session = yield* makeUnifiedSession({ + systemPrompt: + "You are a helpful assistant. Keep responses concise. You have access to a tool called get_secret that returns a secret message.", + tools, + toolHandlers + }) + + // Fork event handler + yield* session.events.pipe( + Stream.runForEach(handleEvent), + Effect.forkDaemon + ) + + // Simple readline loop + const rl = readline.createInterface({ + input: process.stdin, + output: process.stdout + }) + + let running = true + while (running) { + const input = yield* prompt(rl) + if (input.toLowerCase() === "quit") { + rl.close() + running = false + continue + } + + yield* session.sendText(input) + + // Wait a bit for response to stream + yield* Effect.sleep("100 millis") + } + + writeEventLog() + yield* Console.log("\nGoodbye!") +}) + +/** + * Voice mode: audio streaming with voice input/output + */ +const voiceDemo = Effect.gen(function*() { + yield* Console.log("=== Unified Session Demo (Voice Mode) ===") + yield* Console.log("Tools: get_secret (returns a message about the future)") + yield* Console.log("Speak into your microphone. Press Ctrl+C to exit.\n") + + const session = yield* makeUnifiedSession({ + systemPrompt: + "You are a helpful voice assistant. Keep responses brief and conversational. You have access to a tool called get_secret that returns a secret message.", + tools, + toolHandlers + }) + + // Set up audio playback + const audioPlayback = yield* AudioPlayback + const player = yield* audioPlayback.createPlayer() + + // Fork event handler - route audio to playback, handle barge-in + yield* session.events.pipe( + Stream.tap((event) => { + if (event._tag === "AudioDelta" && Buffer.isBuffer(event.chunk)) { + return player.write(event.chunk) + } + // Clear playback queue when user starts speaking (barge-in) + if (event._tag === "SpeechStarted") { + return player.clear + } + return handleEvent(event) + }), + Stream.runDrain, + Effect.forkDaemon + ) + + // Start audio capture and forward to session + const audioCapture = yield* AudioCapture + const audioStream = audioCapture.capture() + + yield* Console.log("Listening... (speak now)") + + yield* audioStream.pipe( + Stream.tap((chunk) => session.sendAudio(chunk)), + Stream.runDrain + ) +}) + +// Build layers based on mode +const httpLayers = Layer.mergeAll( + HttpTransportLive({ model }).pipe( + Layer.provide( + OpenAiChatClient.layer({ + apiKey: Redacted.make(apiKey), + apiUrl: provider.apiUrl + }) + ), + Layer.provide(FetchHttpClient.layer) + ) +) + +// Convert tools to voice format for WS transport +const voiceTools = tools.map((t) => ({ + type: "function" as const, + name: t.name, + description: t.description, + parameters: t.parameters +})) + +const voiceLayers = Layer.mergeAll( + WsTransportLive({ + apiKey: process.env.XAI_API_KEY ?? "", + voice: "ara", + instructions: + "You are a helpful voice assistant. Keep responses brief and conversational. You have access to a tool called get_secret that returns a secret message.", + tools: voiceTools + }).pipe(Layer.provide(GrokVoiceClient.Default)), + AudioCapture.Default, + AudioPlayback.Default, + BunCommandExecutor.layer.pipe(Layer.provide(BunFileSystem.layer)) +) + +// Run the appropriate demo +const runHttp = httpDemo.pipe( + Effect.provide(httpLayers), + Effect.catchAll((error) => Console.error(`Fatal error: ${error}`)) +) + +const runVoice = voiceDemo.pipe( + Effect.provide(voiceLayers), + Effect.catchAll((error) => Console.error(`Fatal error: ${error}`)) +) + +const runnable = mode === "http" ? runHttp : runVoice + +// eslint-disable-next-line no-console +Effect.runPromise(runnable).catch(console.error) diff --git a/src/unified/domain.ts b/src/unified/domain.ts new file mode 100644 index 0000000..6694f23 --- /dev/null +++ b/src/unified/domain.ts @@ -0,0 +1,430 @@ +/** + * Unified LLM Abstraction - Domain Types + * + * Core types for a transport-agnostic conversation interface that works + * with both HTTP-based chat completions and WebSocket-based voice APIs. + */ +import { Context, Effect, Option, Queue, Ref, Schema, Stream } from "effect" + +// Tool definitions + +export interface ToolDefinition { + readonly name: string + readonly description: string + readonly parameters: { + readonly type: "object" + readonly properties: Record + readonly required?: ReadonlyArray + } +} + +export type ToolHandler = (params: unknown) => Effect.Effect + +// Event types that flow from the LLM to the client + +export class TextDelta extends Schema.TaggedClass()("TextDelta", { + delta: Schema.String +}) {} + +export class TextComplete extends Schema.TaggedClass()("TextComplete", { + content: Schema.String +}) {} + +export class AudioDelta extends Schema.TaggedClass()("AudioDelta", { + /** Base64-encoded audio chunk or raw Buffer depending on transport */ + chunk: Schema.Unknown +}) {} + +export class AudioComplete extends Schema.TaggedClass()("AudioComplete", {}) {} + +export class ToolCall extends Schema.TaggedClass()("ToolCall", { + id: Schema.String, + name: Schema.String, + params: Schema.Unknown +}) {} + +export class ToolResult extends Schema.TaggedClass()("ToolResult", { + id: Schema.String, + result: Schema.Unknown +}) {} + +export class TurnComplete extends Schema.TaggedClass()("TurnComplete", { + inputTokens: Schema.optional(Schema.Number), + outputTokens: Schema.optional(Schema.Number) +}) {} + +export class ConversationError extends Schema.TaggedClass()("ConversationError", { + message: Schema.String, + code: Schema.optionalWith(Schema.String, { as: "Option" }) +}) {} + +export class UserTranscript extends Schema.TaggedClass()("UserTranscript", { + transcript: Schema.String +}) {} + +// Voice-specific lifecycle events +export class SessionReady extends Schema.TaggedClass()("SessionReady", {}) {} + +export class SpeechStarted extends Schema.TaggedClass()("SpeechStarted", {}) {} + +export class SpeechStopped extends Schema.TaggedClass()("SpeechStopped", {}) {} + +export class ResponseStarted extends Schema.TaggedClass()("ResponseStarted", {}) {} + +export class RawEvent extends Schema.TaggedClass()("RawEvent", { + type: Schema.String, + data: Schema.Unknown +}) {} + +export const ConversationEvent = Schema.Union( + TextDelta, + TextComplete, + AudioDelta, + AudioComplete, + ToolCall, + ToolResult, + TurnComplete, + ConversationError, + UserTranscript, + SessionReady, + SpeechStarted, + SpeechStopped, + ResponseStarted, + RawEvent +) +export type ConversationEvent = typeof ConversationEvent.Type + +// Input types that flow from client to LLM + +export class TextInput extends Schema.TaggedClass()("TextInput", { + text: Schema.String +}) {} + +export class AudioInput extends Schema.TaggedClass()("AudioInput", { + chunk: Schema.Unknown +}) {} + +export class ToolResponseInput extends Schema.TaggedClass()("ToolResponseInput", { + id: Schema.String, + result: Schema.Unknown +}) {} + +export const ConversationInput = Schema.Union(TextInput, AudioInput, ToolResponseInput) +export type ConversationInput = typeof ConversationInput.Type + +// Transport abstraction + +export interface StatelessConnection { + readonly _tag: "stateless" + /** + * Send a complete turn (all accumulated context + new input) and get response stream. + * The transport handles building the full request from context. + */ + readonly sendTurn: ( + context: ConversationContext, + input: ConversationInput, + tools?: ReadonlyArray + ) => Stream.Stream +} + +export interface StatefulConnection { + readonly _tag: "stateful" + /** Send text to the conversation */ + readonly sendText: (text: string) => Effect.Effect + /** Send an audio chunk */ + readonly sendAudio: (chunk: Buffer) => Effect.Effect + /** Send a tool result */ + readonly sendToolResult: (id: string, result: unknown) => Effect.Effect + /** Stream of events from the LLM */ + readonly events: Stream.Stream + /** Close the connection */ + readonly close: Effect.Effect +} + +export type LlmConnection = StatelessConnection | StatefulConnection + +/** + * Transport service - provides a connection to an LLM. + * Implementations can be HTTP-based (stateless) or WebSocket-based (stateful). + */ +export class LlmTransport extends Context.Tag("@unified/LlmTransport")< + LlmTransport, + { + readonly connect: Effect.Effect + } +>() {} + +// Conversation state + +export interface Message { + readonly role: "system" | "user" | "assistant" | "tool" + readonly content: string + readonly toolCallId?: string + readonly toolCalls?: Array<{ id: string; name: string; params: unknown }> +} + +export interface PendingToolCall { + readonly id: string + readonly name: string + readonly params: unknown +} + +export interface ConversationContext { + readonly messages: Array + readonly systemPrompt: Option.Option + readonly turnNumber: number + readonly pendingToolCalls: Array +} + +export const emptyContext: ConversationContext = { + messages: [], + systemPrompt: Option.none(), + turnNumber: 0, + pendingToolCalls: [] +} + +export const addUserMessage = (ctx: ConversationContext, text: string): ConversationContext => ({ + ...ctx, + messages: [...ctx.messages, { role: "user", content: text }] +}) + +export const addAssistantMessage = ( + ctx: ConversationContext, + content: string, + toolCalls?: Array<{ id: string; name: string; params: unknown }> +): ConversationContext => { + const message: Message = toolCalls && toolCalls.length > 0 + ? { role: "assistant", content, toolCalls } + : { role: "assistant", content } + return { + ...ctx, + messages: [...ctx.messages, message], + pendingToolCalls: toolCalls ?? [], + turnNumber: ctx.turnNumber + 1 + } +} + +export const addToolResult = ( + ctx: ConversationContext, + toolCallId: string, + result: unknown +): ConversationContext => ({ + ...ctx, + messages: [...ctx.messages, { role: "tool", content: JSON.stringify(result), toolCallId }], + pendingToolCalls: ctx.pendingToolCalls.filter((tc) => tc.id !== toolCallId) +}) + +export const setSystemPrompt = (ctx: ConversationContext, prompt: string): ConversationContext => ({ + ...ctx, + systemPrompt: Option.some(prompt) +}) + +/** + * Unified session service. + * Wraps a transport and provides a consistent interface for both stateless and stateful modes. + */ +export class UnifiedSession extends Context.Tag("@unified/UnifiedSession")< + UnifiedSession, + { + /** Send a text message */ + readonly sendText: (text: string) => Effect.Effect + /** Send an audio chunk (only supported for stateful transports) */ + readonly sendAudio: (chunk: Buffer) => Effect.Effect + /** Send a tool result */ + readonly sendToolResult: (id: string, result: unknown) => Effect.Effect + /** Stream of all conversation events */ + readonly events: Stream.Stream + /** Get current conversation context */ + readonly getContext: Effect.Effect + /** Set system prompt */ + readonly setSystemPrompt: (prompt: string) => Effect.Effect + } +>() {} + +/** + * Configuration for the unified session + */ +export interface UnifiedSessionConfig { + readonly systemPrompt?: string + readonly tools?: ReadonlyArray + readonly toolHandlers?: Record +} + +/** + * Create a UnifiedSession from a transport. + */ +export const makeUnifiedSession = ( + config?: UnifiedSessionConfig +): Effect.Effect< + { + readonly sendText: (text: string) => Effect.Effect + readonly sendAudio: (chunk: Buffer) => Effect.Effect + readonly sendToolResult: (id: string, result: unknown) => Effect.Effect + readonly events: Stream.Stream + readonly getContext: Effect.Effect + readonly setSystemPrompt: (prompt: string) => Effect.Effect + }, + ConversationError, + LlmTransport +> => + Effect.gen(function*() { + const transport = yield* LlmTransport + const connection = yield* transport.connect + + const initialContext: ConversationContext = config?.systemPrompt + ? setSystemPrompt(emptyContext, config.systemPrompt) + : emptyContext + + const contextRef = yield* Ref.make(initialContext) + const eventQueue = yield* Queue.unbounded() + + const tools = config?.tools + const toolHandlers = config?.toolHandlers ?? {} + + // Execute a tool and return the result + const executeToolCall = ( + toolCall: PendingToolCall + ): Effect.Effect<{ id: string; result: unknown }, ConversationError> => + Effect.gen(function*() { + const handler = toolHandlers[toolCall.name] + if (!handler) { + return { id: toolCall.id, result: { error: `Unknown tool: ${toolCall.name}` } } + } + const result = yield* handler(toolCall.params).pipe( + Effect.catchAll((e) => Effect.succeed({ error: e.message })) + ) + return { id: toolCall.id, result } + }) + + // Process a single turn and return accumulated tool calls + const processTurn = ( + ctx: ConversationContext, + input: ConversationInput + ): Effect.Effect, ConversationError> => + Effect.gen(function*() { + if (connection._tag !== "stateless") return [] + + const responseStream = connection.sendTurn(ctx, input, tools) + let fullResponse = "" + const toolCalls: Array = [] + + yield* responseStream.pipe( + Stream.tap((event) => + Effect.gen(function*() { + yield* Queue.offer(eventQueue, event) + if (event._tag === "TextDelta") { + fullResponse += event.delta + } else if (event._tag === "ToolCall") { + toolCalls.push({ id: event.id, name: event.name, params: event.params }) + } + }) + ), + Stream.runDrain + ) + + // Update context with assistant response (including any tool calls) + yield* Ref.update(contextRef, (c) => + addAssistantMessage( + c, + fullResponse, + toolCalls.length > 0 ? toolCalls : undefined + )) + + return toolCalls + }) + + // Run the agent loop: process turn, execute tools, repeat until no tool calls + const runAgentLoop = (input: ConversationInput): Effect.Effect => + Effect.gen(function*() { + let ctx = yield* Ref.get(contextRef) + let toolCalls = yield* processTurn(ctx, input) + + while (toolCalls.length > 0) { + // Execute all tool calls + for (const toolCall of toolCalls) { + const { id, result } = yield* executeToolCall(toolCall) + // Emit tool result event + yield* Queue.offer(eventQueue, new ToolResult({ id, result })) + // Update context with tool result + yield* Ref.update(contextRef, (c) => addToolResult(c, id, result)) + } + + // Get updated context and send another turn + ctx = yield* Ref.get(contextRef) + toolCalls = yield* processTurn(ctx, new ToolResponseInput({ id: "", result: null })) + } + }) + + // For stateful connections, fork a fiber to pump events to queue + // and automatically execute tool calls + if (connection._tag === "stateful") { + yield* connection.events.pipe( + Stream.tap((event) => + Effect.gen(function*() { + yield* Queue.offer(eventQueue, event) + + // Auto-execute tool calls for stateful connections + if (event._tag === "ToolCall") { + const { id, result } = yield* executeToolCall({ + id: event.id, + name: event.name, + params: event.params + }) + yield* Queue.offer(eventQueue, new ToolResult({ id, result })) + yield* connection.sendToolResult(id, result) + } + }) + ), + Stream.runDrain, + Effect.forkDaemon + ) + } + + const sendText = (text: string): Effect.Effect => + Effect.gen(function*() { + yield* Ref.update(contextRef, (ctx) => addUserMessage(ctx, text)) + + if (connection._tag === "stateless") { + yield* runAgentLoop(new TextInput({ text })) + } else { + yield* connection.sendText(text) + } + }) + + const sendAudio = (chunk: Buffer): Effect.Effect => + Effect.gen(function*() { + if (connection._tag === "stateless") { + return yield* Effect.fail( + new ConversationError({ + message: "Audio input not supported for stateless transport", + code: Option.some("UNSUPPORTED_OPERATION") + }) + ) + } + yield* connection.sendAudio(chunk) + }) + + const sendToolResult = (id: string, result: unknown): Effect.Effect => + Effect.gen(function*() { + yield* Ref.update(contextRef, (ctx) => addToolResult(ctx, id, result)) + + if (connection._tag === "stateless") { + yield* runAgentLoop(new ToolResponseInput({ id, result })) + } else { + yield* connection.sendToolResult(id, result) + } + }) + + const getContext = Ref.get(contextRef) + + const setSystemPromptFn = (prompt: string) => Ref.update(contextRef, (ctx) => setSystemPrompt(ctx, prompt)) + + return { + sendText, + sendAudio, + sendToolResult, + events: Stream.fromQueue(eventQueue), + getContext, + setSystemPrompt: setSystemPromptFn + } + }) diff --git a/src/unified/http-transport.ts b/src/unified/http-transport.ts new file mode 100644 index 0000000..6888105 --- /dev/null +++ b/src/unified/http-transport.ts @@ -0,0 +1,201 @@ +/** + * HTTP Transport for Unified Session + * + * Wraps OpenAI-compatible chat completions API as a stateless transport. + */ +import { Effect, Layer, Option, Stream } from "effect" + +import { OpenAiChatClient } from "../openai-chat-completions-client.ts" +import { + type ConversationContext, + ConversationError, + type ConversationInput, + LlmTransport, + type Message, + RawEvent, + type StatelessConnection, + TextDelta, + ToolCall, + type ToolDefinition, + TurnComplete +} from "./domain.ts" + +interface HttpTransportConfig { + readonly model: string +} + +/** + * Convert our Message format to OpenAI chat message format + */ +const messageToOpenAi = (msg: Message) => { + const base: { + role: "system" | "user" | "assistant" | "tool" + content: string | null + name?: string + tool_calls?: Array<{ id: string; type: "function"; function: { name: string; arguments: string } }> + tool_call_id?: string + } = { + role: msg.role, + content: msg.content + } + + if (msg.toolCallId) { + base.tool_call_id = msg.toolCallId + } + + if (msg.toolCalls && msg.toolCalls.length > 0) { + base.tool_calls = msg.toolCalls.map((tc) => ({ + id: tc.id, + type: "function" as const, + function: { + name: tc.name, + arguments: JSON.stringify(tc.params) + } + })) + } + + return base +} + +/** + * Build messages array from conversation context + */ +const buildMessages = (ctx: ConversationContext) => { + const messages: Array> = [] + + // Add system prompt if present + if (Option.isSome(ctx.systemPrompt)) { + messages.push({ role: "system", content: ctx.systemPrompt.value }) + } + + // Add conversation messages + for (const msg of ctx.messages) { + messages.push(messageToOpenAi(msg)) + } + + return messages +} + +/** + * Convert tool definitions to OpenAI format + */ +const toolsToOpenAi = (tools: ReadonlyArray) => + tools.map((tool) => ({ + type: "function" as const, + function: { + name: tool.name, + description: tool.description, + parameters: tool.parameters + } + })) + +/** + * Create HTTP transport layer + */ +export const HttpTransportLive = ( + config: HttpTransportConfig +): Layer.Layer => + Layer.effect( + LlmTransport, + Effect.gen(function*() { + const client = yield* OpenAiChatClient + + return { + connect: Effect.succeed({ + _tag: "stateless", + sendTurn: ( + ctx: ConversationContext, + _input: ConversationInput, + tools?: ReadonlyArray + ) => { + const messages = buildMessages(ctx) + + const request = { + model: config.model, + messages, + stream: true as const, + stream_options: { include_usage: true }, + ...(tools && tools.length > 0 ? { tools: toolsToOpenAi(tools) } : {}) + } + + // Track tool calls being built across streaming chunks + const activeToolCalls: Record = {} + + return client.createChatCompletionStream(request).pipe( + Stream.mapEffect((chunk) => + Effect.sync(() => { + const events: Array< + TextDelta | ToolCall | TurnComplete | ConversationError | RawEvent + > = [] + + // Log raw chunk + events.push(new RawEvent({ type: "chat.completion.chunk", data: chunk })) + + const choice = chunk.choices[0] + if (choice?.delta) { + const delta = choice.delta + + // Text content + if (delta.content && delta.content.length > 0) { + events.push(new TextDelta({ delta: delta.content })) + } + + // Tool calls - accumulate across chunks + if (delta.tool_calls) { + for (const tc of delta.tool_calls) { + const idx = tc.index + if (tc.id && tc.function?.name) { + // New tool call starting + activeToolCalls[idx] = { + id: tc.id, + name: tc.function.name, + args: tc.function.arguments ?? "" + } + } else if (activeToolCalls[idx] && tc.function?.arguments) { + // Continuing to accumulate arguments + activeToolCalls[idx].args += tc.function.arguments + } + } + } + } + + // On finish_reason, emit accumulated tool calls + if (choice?.finish_reason === "tool_calls") { + for (const tc of Object.values(activeToolCalls)) { + try { + const params = tc.args ? JSON.parse(tc.args) : {} + events.push(new ToolCall({ id: tc.id, name: tc.name, params })) + } catch { + events.push(new ToolCall({ id: tc.id, name: tc.name, params: {} })) + } + } + } + + // Finish event + if (chunk.usage) { + events.push( + new TurnComplete({ + inputTokens: chunk.usage.prompt_tokens, + outputTokens: chunk.usage.completion_tokens + }) + ) + } + + return events + }) + ), + Stream.flatMap((events) => Stream.fromIterable(events)), + Stream.catchAll((error) => + Stream.succeed( + new ConversationError({ + message: error.message, + code: Option.some("HTTP_ERROR") + }) + ) + ) + ) + } + }) + } + }) + ) diff --git a/src/unified/index.ts b/src/unified/index.ts new file mode 100644 index 0000000..d938a4c --- /dev/null +++ b/src/unified/index.ts @@ -0,0 +1,45 @@ +/** + * Unified LLM Abstraction + * + * Transport-agnostic conversation interface for text and voice LLMs. + */ + +// Core domain types +export { + // Context helpers + addAssistantMessage, + addToolResult, + addUserMessage, + // Event types + AudioComplete, + AudioDelta, + AudioInput, + type ConversationContext, + ConversationError, + ConversationEvent, + ConversationInput, + emptyContext, + // Transport types + type LlmConnection, + LlmTransport, + // Session + makeUnifiedSession, + type Message, + setSystemPrompt, + type StatefulConnection, + type StatelessConnection, + TextComplete, + TextDelta, + TextInput, + ToolCall, + ToolResponseInput, + ToolResult, + TurnComplete, + UnifiedSession, + type UnifiedSessionConfig, + UserTranscript +} from "./domain.ts" + +// Transport implementations +export { HttpTransportLive } from "./http-transport.ts" +export { WsTransportLive } from "./ws-transport.ts" diff --git a/src/unified/ws-transport.ts b/src/unified/ws-transport.ts new file mode 100644 index 0000000..2b444f1 --- /dev/null +++ b/src/unified/ws-transport.ts @@ -0,0 +1,184 @@ +/** + * WebSocket Transport for Unified Session + * + * Wraps GrokVoiceClient as a stateful transport for voice conversations. + */ +import { Effect, Layer, Option, Stream } from "effect" + +import { GrokVoiceClient, type GrokVoiceConnection } from "../voice/client.ts" +import type { ToolDefinition as VoiceToolDefinition, VoiceSessionConfig } from "../voice/domain.ts" +import { + AudioDelta, + ConversationError, + type ConversationEvent, + LlmTransport, + RawEvent, + ResponseStarted, + SessionReady, + SpeechStarted, + SpeechStopped, + type StatefulConnection, + TextDelta, + ToolCall, + TurnComplete, + UserTranscript +} from "./domain.ts" + +interface WsTransportConfig { + readonly apiKey: string + readonly apiUrl?: string + readonly voice?: "ara" | "rex" | "sal" | "eve" | "leo" + readonly instructions?: string + readonly tools?: ReadonlyArray +} + +/** + * Translate voice connection events to unified ConversationEvents + */ +const translateVoiceEvents = ( + conn: GrokVoiceConnection +): Stream.Stream => { + // Merge all event streams into a single stream of ConversationEvents + const textEvents: Stream.Stream = conn.transcripts.pipe( + Stream.map((delta): ConversationEvent => new TextDelta({ delta })) + ) + + const audioEvents: Stream.Stream = conn.audioOutput.pipe( + Stream.map((chunk): ConversationEvent => new AudioDelta({ chunk })) + ) + + const userTranscriptEvents: Stream.Stream = conn.userTranscripts.pipe( + Stream.map((transcript): ConversationEvent => new UserTranscript({ transcript })) + ) + + const toolCallEvents: Stream.Stream = conn.toolCalls.pipe( + Stream.map((tc): ConversationEvent => new ToolCall({ id: tc.id, name: tc.name, params: tc.params })) + ) + + // Translate all raw events to typed ConversationEvents + const rawEventStream: Stream.Stream = conn.events.pipe( + Stream.filter( + (event): event is { type: string; [key: string]: unknown } => + typeof event === "object" && event !== null && "type" in event + ), + Stream.map((event): ConversationEvent => { + switch (event.type) { + case "session.updated": + return new SessionReady({}) + case "input_audio_buffer.speech_started": + return new SpeechStarted({}) + case "input_audio_buffer.speech_stopped": + return new SpeechStopped({}) + case "response.created": + return new ResponseStarted({}) + case "response.done": + return new TurnComplete({}) + default: + // Capture all other events as RawEvent + return new RawEvent({ type: event.type, data: event }) + } + }) + ) + + return Stream.mergeAll([textEvents, audioEvents, userTranscriptEvents, toolCallEvents, rawEventStream], { + concurrency: "unbounded" + }).pipe( + Stream.catchAll((error: unknown) => + Stream.succeed( + new ConversationError({ + message: error instanceof Error ? error.message : String(error), + code: Option.some("WS_ERROR") + }) + ) + ) + ) +} + +/** + * Create WebSocket transport layer + */ +export const WsTransportLive = ( + config: WsTransportConfig +): Layer.Layer => + Layer.effect( + LlmTransport, + Effect.gen(function*() { + const voiceClient = yield* GrokVoiceClient + + return { + connect: Effect.gen(function*() { + const sessionConfig: VoiceSessionConfig = { + apiKey: config.apiKey, + apiUrl: config.apiUrl, + voice: config.voice, + instructions: config.instructions, + tools: config.tools + } + + const conn = yield* voiceClient.connect(sessionConfig).pipe( + Effect.mapError( + (error) => + new ConversationError({ + message: error.message, + code: Option.some("CONNECTION_ERROR") + }) + ) + ) + + // Wait for connection to be ready + yield* conn.waitForReady.pipe( + Effect.mapError( + () => + new ConversationError({ + message: "Connection timeout waiting for ready", + code: Option.some("TIMEOUT") + }) + ) + ) + + const statefulConn: StatefulConnection = { + _tag: "stateful", + + sendText: (text: string) => + conn.sendText(text).pipe( + Effect.mapError( + () => + new ConversationError({ + message: "Failed to send text", + code: Option.some("SEND_ERROR") + }) + ) + ), + + sendAudio: (chunk: Buffer) => + conn.send(chunk).pipe( + Effect.mapError( + () => + new ConversationError({ + message: "Failed to send audio", + code: Option.some("SEND_ERROR") + }) + ) + ), + + sendToolResult: (id: string, result: unknown) => + conn.sendToolResult(id, result).pipe( + Effect.mapError( + () => + new ConversationError({ + message: "Failed to send tool result", + code: Option.some("SEND_ERROR") + }) + ) + ), + + events: translateVoiceEvents(conn), + + close: conn.close + } + + return statefulConn + }) + } + }) + ) diff --git a/src/voice/audio-capture.ts b/src/voice/audio-capture.ts new file mode 100644 index 0000000..7a18a52 --- /dev/null +++ b/src/voice/audio-capture.ts @@ -0,0 +1,93 @@ +/** + * Audio Capture Service + * + * Captures microphone audio using the sox CLI tool. + * Requires sox to be installed: brew install sox + */ +import type { CommandExecutor } from "@effect/platform" +import { Command } from "@effect/platform" +import { Chunk, Effect, Stream } from "effect" + +import { DEFAULT_SAMPLE_RATE } from "./domain.ts" + +export interface AudioCaptureConfig { + readonly sampleRate?: number + readonly chunkSize?: number +} + +export class AudioCapture extends Effect.Service()("@lome/AudioCapture", { + effect: Effect.succeed({ + /** + * Start capturing audio from the default microphone. + * Returns a stream of PCM 16-bit mono audio buffers. + * + * Uses sox CLI: + * sox -d -t raw -r 24000 -e signed -b 16 -c 1 - + * -d: default audio device (microphone) + * -t raw: output raw PCM + * -r 24000: sample rate + * -e signed: signed integer encoding + * -b 16: 16-bit + * -c 1: mono + * -: output to stdout + */ + capture: (config?: AudioCaptureConfig): Stream.Stream => { + const sampleRate = config?.sampleRate ?? DEFAULT_SAMPLE_RATE + const chunkSize = config?.chunkSize ?? 4096 + + const command = Command.make( + "sox", + "-q", // quiet - suppress status output + "-d", + "-t", + "raw", + "-r", + String(sampleRate), + "-e", + "signed", + "-b", + "16", + "-c", + "1", + "-" + ) + + return Command.stream(command).pipe( + Stream.mapChunks((chunks) => { + const buffers: Array = [] + let accumulated = Buffer.alloc(0) + + for (const chunk of chunks) { + accumulated = Buffer.concat([accumulated, Buffer.from(chunk)]) + while (accumulated.length >= chunkSize) { + buffers.push(accumulated.subarray(0, chunkSize)) + accumulated = accumulated.subarray(chunkSize) + } + } + + if (accumulated.length > 0) { + buffers.push(accumulated) + } + + return Chunk.fromIterable(buffers) + }), + Stream.catchAll((error) => + Stream.fail( + new Error(`Audio capture failed. Is sox installed? (brew install sox)\n${error}`) + ) + ) + ) + }, + + /** + * Check if sox is available + */ + checkSoxAvailable: Effect.gen(function*() { + const command = Command.make("which", "sox") + const result = yield* Command.string(command).pipe( + Effect.catchAll(() => Effect.succeed("")) + ) + return result.trim().length > 0 + }) + }) +}) {} diff --git a/src/voice/audio-playback.ts b/src/voice/audio-playback.ts new file mode 100644 index 0000000..f390842 --- /dev/null +++ b/src/voice/audio-playback.ts @@ -0,0 +1,173 @@ +/** + * Audio Playback Service + * + * Plays audio through speakers using the sox CLI tool. + * Requires sox to be installed: brew install sox + */ +import { Effect, type Fiber, Queue, Stream } from "effect" +import { type ChildProcess, spawn } from "node:child_process" + +import { DEFAULT_SAMPLE_RATE } from "./domain.ts" + +export interface AudioPlaybackConfig { + readonly sampleRate?: number +} + +export interface AudioPlayer { + readonly write: (audio: Buffer) => Effect.Effect + readonly clear: Effect.Effect + readonly close: Effect.Effect + readonly writerFiber: Fiber.RuntimeFiber +} + +export class AudioPlayback extends Effect.Service()("@lome/AudioPlayback", { + effect: Effect.succeed({ + /** + * Create a playback sink that accepts audio buffers. + * Returns a function to write audio and a cleanup effect. + * + * Uses sox CLI: + * sox -t raw -r 24000 -e signed -b 16 -c 1 - -d + * -t raw: input raw PCM + * -r 24000: sample rate + * -e signed: signed integer encoding + * -b 16: 16-bit + * -c 1: mono + * -: input from stdin + * -d: output to default audio device (speakers) + */ + createPlayer: (config?: AudioPlaybackConfig): Effect.Effect => + Effect.gen(function*() { + const sampleRate = config?.sampleRate ?? DEFAULT_SAMPLE_RATE + + const audioQueue = yield* Queue.unbounded() + let soxProcess: ChildProcess | null = null + let isRunning = true + + soxProcess = spawn("sox", [ + "-q", // quiet - suppress status output + "-t", + "raw", + "-r", + String(sampleRate), + "-e", + "signed", + "-b", + "16", + "-c", + "1", + "-", + "-d" + ], { + stdio: ["pipe", "ignore", "ignore"] + }) + + soxProcess.on("error", (error) => { + Effect.runSync(Effect.logError(`Sox playback error: ${error.message}`)) + }) + + soxProcess.on("close", () => { + isRunning = false + }) + + const writerFiber = yield* Effect.fork( + Stream.fromQueue(audioQueue).pipe( + Stream.runForEach((buffer) => + Effect.sync(() => { + if (soxProcess?.stdin && !soxProcess.stdin.destroyed && isRunning) { + soxProcess.stdin.write(buffer) + } + }) + ) + ) + ) + + const write = (audio: Buffer): Effect.Effect => Queue.offer(audioQueue, audio).pipe(Effect.asVoid) + + // Clear queue and restart sox to enable barge-in + const clear: Effect.Effect = Effect.gen(function*() { + // Drain queue (take all pending items and discard) + let cleared = 0 + while (true) { + const item = yield* Queue.poll(audioQueue) + if (item._tag === "None") break + cleared++ + } + + // Kill old sox (don't wait for close event) + const oldProcess = soxProcess + if (oldProcess) { + oldProcess.removeAllListeners("close") + oldProcess.stdin?.end() + oldProcess.kill("SIGKILL") + } + + // Start new sox process + soxProcess = spawn("sox", [ + "-q", // quiet - suppress status output + "-t", + "raw", + "-r", + String(sampleRate), + "-e", + "signed", + "-b", + "16", + "-c", + "1", + "-", + "-d" + ], { + stdio: ["pipe", "ignore", "ignore"] + }) + + isRunning = true + + soxProcess.on("error", (error) => { + Effect.runSync(Effect.logError(`Sox playback error: ${error.message}`)) + }) + + soxProcess.on("close", () => { + isRunning = false + }) + + if (cleared > 0) { + yield* Effect.logDebug(`Barge-in: cleared ${cleared} audio chunks`) + } + }) + + const close: Effect.Effect = Effect.gen(function*() { + isRunning = false + yield* Queue.shutdown(audioQueue) + if (soxProcess?.stdin) { + soxProcess.stdin.end() + } + if (soxProcess) { + soxProcess.kill() + } + }) + + return { + write, + clear, + close, + writerFiber + } + }), + + /** + * Stream audio to speakers. + * Convenience method that handles player lifecycle. + */ + play: (audioStream: Stream.Stream, config?: AudioPlaybackConfig) => + Effect.gen(function*() { + const playback = yield* AudioPlayback + const player = yield* playback.createPlayer(config) + yield* audioStream.pipe( + Stream.runForEach((buffer) => player.write(buffer)) + ).pipe( + Effect.ensuring(player.close) + ) + }) + }) +}) {} diff --git a/src/voice/cli.ts b/src/voice/cli.ts new file mode 100644 index 0000000..977ad41 --- /dev/null +++ b/src/voice/cli.ts @@ -0,0 +1,183 @@ +/** + * Voice CLI Command + * + * CLI interface for real-time voice conversations with Grok. + */ +import { Command, Options } from "@effect/cli" +import { BunCommandExecutor } from "@effect/platform-bun" +import { Config, Console, Effect, Fiber, Layer, Option, Redacted, Stream } from "effect" + +import { AudioCapture } from "./audio-capture.ts" +import { AudioPlayback } from "./audio-playback.ts" +import { GrokVoiceClient, type GrokVoiceConnection } from "./client.ts" +import { DEFAULT_INSTRUCTIONS, DEFAULT_SAMPLE_RATE, DEFAULT_VOICE, type VoiceName } from "./domain.ts" + +const voiceOption = Options.choice("voice", ["ara", "rex", "sal", "eve", "leo"]).pipe( + Options.withAlias("v"), + Options.withDescription("Voice to use (ara, rex, sal, eve, leo)"), + Options.withDefault(DEFAULT_VOICE) +) + +const sampleRateOption = Options.integer("sample-rate").pipe( + Options.withAlias("r"), + Options.withDescription("Audio sample rate in Hz"), + Options.withDefault(DEFAULT_SAMPLE_RATE) +) + +const instructionsOption = Options.text("instructions").pipe( + Options.withAlias("i"), + Options.withDescription("System instructions for the assistant"), + Options.optional +) + +const textModeOption = Options.boolean("text").pipe( + Options.withAlias("t"), + Options.withDescription("Text mode - type messages instead of speaking"), + Options.withDefault(false) +) + +const VoiceLayer = Layer.mergeAll( + GrokVoiceClient.Default, + AudioCapture.Default, + AudioPlayback.Default, + BunCommandExecutor.layer +) + +const runVoiceChat = (options: { + voice: string + sampleRate: number + instructions: Option.Option + textMode: boolean +}) => + Effect.gen(function*() { + const apiKey = yield* Config.redacted("XAI_API_KEY").pipe( + Effect.map((r) => Redacted.value(r)), + Effect.catchAll(() => Effect.fail(new Error("XAI_API_KEY environment variable is required"))) + ) + + const capture = yield* AudioCapture + const playback = yield* AudioPlayback + const voiceClient = yield* GrokVoiceClient + + const soxAvailable = yield* capture.checkSoxAvailable + if (!soxAvailable && !options.textMode) { + yield* Console.error("sox is not installed. Please run: brew install sox") + yield* Console.error("Or use --text mode to type messages instead.") + return + } + + yield* Console.log("╔════════════════════════════════════════════╗") + yield* Console.log("║ Grok Voice Chat ║") + yield* Console.log("╠════════════════════════════════════════════╣") + yield* Console.log(`║ Voice: ${options.voice.padEnd(36)}║`) + yield* Console.log(`║ Sample Rate: ${String(options.sampleRate).padEnd(30)}║`) + yield* Console.log(`║ Mode: ${(options.textMode ? "Text" : "Voice").padEnd(37)}║`) + yield* Console.log("╚════════════════════════════════════════════╝") + yield* Console.log("") + + if (!options.textMode) { + yield* Console.log("Speak into your microphone. Press Ctrl+C to exit.") + } else { + yield* Console.log("Type your message and press Enter. Press Ctrl+C to exit.") + } + yield* Console.log("") + + const connection = yield* voiceClient.connect({ + apiKey, + voice: options.voice as VoiceName, + sampleRate: options.sampleRate, + instructions: Option.isSome(options.instructions) + ? options.instructions.value + : DEFAULT_INSTRUCTIONS + }) + + yield* connection.waitForReady + + yield* Console.log("Connected! Starting conversation...") + yield* Console.log("") + + const player = yield* playback.createPlayer({ sampleRate: options.sampleRate }) + + const audioPlaybackFiber = yield* connection.audioOutput.pipe( + Stream.runForEach((buffer) => player.write(buffer)), + Effect.fork + ) + + const transcriptFiber = yield* connection.transcripts.pipe( + Stream.runForEach((delta) => Effect.sync(() => process.stdout.write(`\x1b[36m${delta}\x1b[0m`))), + Effect.fork + ) + + const userTranscriptFiber = yield* connection.userTranscripts.pipe( + Stream.runForEach((transcript) => Console.log(`\n\x1b[33mYou: ${transcript}\x1b[0m`)), + Effect.fork + ) + + if (options.textMode) { + yield* runTextMode(connection) + } else { + const micStream = capture.capture({ sampleRate: options.sampleRate }) + yield* micStream.pipe( + Stream.runForEach((buffer) => connection.send(buffer)) + ) + } + + yield* Fiber.interrupt(audioPlaybackFiber) + yield* Fiber.interrupt(transcriptFiber) + yield* Fiber.interrupt(userTranscriptFiber) + yield* player.close + yield* connection.close + }).pipe( + Effect.provide(VoiceLayer), + Effect.catchAll((error) => Console.error(`Error: ${error instanceof Error ? error.message : String(error)}`)) + ) + +const runTextMode = (connection: GrokVoiceConnection) => + Effect.gen(function*() { + const readline = yield* Effect.promise(() => import("node:readline")) + + const rl = readline.createInterface({ + input: process.stdin, + output: process.stdout + }) + + yield* Effect.async((resume) => { + const prompt = () => { + rl.question("\x1b[33mYou: \x1b[0m", (answer) => { + if (answer.trim()) { + Effect.runSync(connection.sendText(answer.trim())) + } + prompt() + }) + } + + prompt() + + rl.on("close", () => { + resume(Effect.void) + }) + + return Effect.sync(() => { + rl.close() + }) + }) + }) + +export const voiceCommand = Command.make( + "voice", + { + voice: voiceOption, + sampleRate: sampleRateOption, + instructions: instructionsOption, + textMode: textModeOption + }, + ({ instructions, sampleRate, textMode, voice }) => + runVoiceChat({ + instructions, + sampleRate, + textMode, + voice + }) +).pipe( + Command.withDescription("Real-time voice conversation with Grok AI") +) diff --git a/src/voice/client.ts b/src/voice/client.ts new file mode 100644 index 0000000..c63eefb --- /dev/null +++ b/src/voice/client.ts @@ -0,0 +1,328 @@ +/** + * Grok Voice Client + * + * WebSocket client for XAI's realtime voice API. + */ +import { Effect, Queue, Schema, Stream } from "effect" +import WebSocket from "ws" + +import { + ConversationCreatedEvent, + ConversationItemCreateMessage, + DEFAULT_API_URL, + DEFAULT_INSTRUCTIONS, + DEFAULT_SAMPLE_RATE, + DEFAULT_VOICE, + ErrorEvent, + InputAudioBufferSpeechStartedEvent, + InputAudioTranscriptionCompletedEvent, + ResponseCreateMessage, + ResponseDoneEvent, + ResponseOutputAudioDeltaEvent, + ResponseOutputAudioTranscriptDeltaEvent, + SessionUpdatedEvent, + type VoiceSessionConfig +} from "./domain.ts" + +export interface ToolCallEvent { + readonly id: string + readonly name: string + readonly params: unknown +} + +export interface GrokVoiceConnection { + readonly send: (audio: Buffer) => Effect.Effect + readonly sendText: (text: string) => Effect.Effect + readonly sendToolResult: (callId: string, result: unknown) => Effect.Effect + readonly audioOutput: Stream.Stream + readonly transcripts: Stream.Stream + readonly userTranscripts: Stream.Stream + readonly toolCalls: Stream.Stream + readonly events: Stream.Stream + readonly close: Effect.Effect + readonly waitForReady: Effect.Effect +} + +interface WsHolder { + ws: WebSocket | null +} + +export class GrokVoiceClient extends Effect.Service()("@lome/GrokVoiceClient", { + effect: Effect.succeed({ + connect: (config: VoiceSessionConfig): Effect.Effect => + Effect.gen(function*() { + const apiUrl = config.apiUrl ?? DEFAULT_API_URL + const voice = config.voice ?? DEFAULT_VOICE + const instructions = config.instructions ?? DEFAULT_INSTRUCTIONS + + yield* Effect.log(`Connecting to ${apiUrl}`) + + const audioQueue = yield* Queue.unbounded() + const transcriptQueue = yield* Queue.unbounded() + const userTranscriptQueue = yield* Queue.unbounded() + const toolCallQueue = yield* Queue.unbounded() + const eventQueue = yield* Queue.unbounded() + const readyQueue = yield* Queue.bounded(1) + + // Track active function calls being built + const activeFunctionCalls: Record = {} + + const holder: WsHolder = { ws: null } + let isConfigured = false + + const sendSessionConfig = (socket: WebSocket) => { + const sampleRate = config.sampleRate ?? DEFAULT_SAMPLE_RATE + const sessionConfig: { + type: string + session: { + instructions: string + voice: string + audio: { + input: { format: { type: string; rate: number } } + output: { format: { type: string; rate: number } } + } + input_audio_transcription: { model: string } + turn_detection: { + type: string + threshold: number + prefix_padding_ms: number + silence_duration_ms: number + create_response: boolean + } + tools?: Array<{ type: string; name: string; description: string; parameters: unknown }> + } + } = { + type: "session.update", + session: { + instructions, + voice, + audio: { + input: { + format: { + type: "audio/pcm", + rate: sampleRate + } + }, + output: { + format: { + type: "audio/pcm", + rate: sampleRate + } + } + }, + input_audio_transcription: { model: "whisper-large-v3-turbo" }, + turn_detection: { + type: "server_vad", + threshold: 0.5, + prefix_padding_ms: 300, + silence_duration_ms: 500, + create_response: true + } + } + } + + // Add tools if provided + if (config.tools && config.tools.length > 0) { + sessionConfig.session.tools = config.tools.map((t) => ({ + type: "function", + name: t.name, + description: t.description, + parameters: t.parameters + })) + } + + socket.send(JSON.stringify(sessionConfig)) + isConfigured = true + } + + const handleMessage = (data: WebSocket.Data) => { + try { + const message = JSON.parse(data.toString()) as { type: string; [key: string]: unknown } + const eventType = message.type + + Effect.runSync(Queue.offer(eventQueue, message)) + + if (eventType === "conversation.created") { + Schema.decodeUnknownSync(ConversationCreatedEvent)(message) + if (!isConfigured && holder.ws) { + Effect.runSync(Effect.log("Configuring session...")) + sendSessionConfig(holder.ws) + } + } else if (eventType === "session.updated") { + Schema.decodeUnknownSync(SessionUpdatedEvent)(message) + Effect.runSync(Effect.log("Session configured, ready for voice")) + Effect.runSync(Queue.offer(readyQueue, void 0)) + } else if (eventType === "response.output_audio.delta") { + const evt = Schema.decodeUnknownSync(ResponseOutputAudioDeltaEvent)(message) + const audioBuffer = Buffer.from(evt.delta, "base64") + Effect.runSync(Queue.offer(audioQueue, audioBuffer)) + } else if (eventType === "response.output_audio_transcript.delta") { + const evt = Schema.decodeUnknownSync(ResponseOutputAudioTranscriptDeltaEvent)(message) + Effect.runSync(Queue.offer(transcriptQueue, evt.delta)) + } else if (eventType === "conversation.item.input_audio_transcription.completed") { + const evt = Schema.decodeUnknownSync(InputAudioTranscriptionCompletedEvent)(message) + if (evt.transcript) { + Effect.runSync(Queue.offer(userTranscriptQueue, evt.transcript)) + } + } else if (eventType === "input_audio_buffer.speech_started") { + Schema.decodeUnknownSync(InputAudioBufferSpeechStartedEvent)(message) + Effect.runSync(Effect.log("Speech detected")) + } else if (eventType === "response.created") { + Effect.runSync(Effect.log("Response started")) + } else if (eventType === "response.done") { + Schema.decodeUnknownSync(ResponseDoneEvent)(message) + Effect.runSync(Effect.log("Response complete")) + } else if (eventType === "response.output_item.added") { + // Check if this is a function call item + const item = (message as { item?: { type?: string; call_id?: string; name?: string } }).item + if (item?.type === "function_call" && item.call_id && item.name) { + activeFunctionCalls[item.call_id] = { name: item.name, args: "" } + Effect.runSync(Effect.log(`Function call started: ${item.name}`)) + } + } else if (eventType === "response.function_call_arguments.delta") { + const msg = message as { call_id?: string; delta?: string } + const callId = msg.call_id + if (callId && msg.delta) { + const fc = activeFunctionCalls[callId] + if (fc) { + fc.args += msg.delta + } + } + } else if (eventType === "response.function_call_arguments.done") { + const msg = message as { call_id?: string; arguments?: string } + const callId = msg.call_id + if (callId) { + const fc = activeFunctionCalls[callId] + if (fc) { + const args = msg.arguments ?? fc.args + try { + const params = args ? JSON.parse(args) : {} + Effect.runSync(Queue.offer(toolCallQueue, { id: callId, name: fc.name, params })) + Effect.runSync(Effect.log(`Function call complete: ${fc.name}`)) + } catch { + Effect.runSync(Queue.offer(toolCallQueue, { id: callId, name: fc.name, params: {} })) + } + delete activeFunctionCalls[callId] + } + } + } else if (eventType === "error") { + const evt = Schema.decodeUnknownSync(ErrorEvent)(message) + Effect.runSync(Effect.logError(`XAI Error: ${evt.error?.message ?? "Unknown error"}`)) + } + } catch (e) { + Effect.runSync(Effect.logDebug(`Failed to parse message: ${e}`)) + } + } + + yield* Effect.async((resume) => { + const ws = new WebSocket(apiUrl, { + headers: { + Authorization: `Bearer ${config.apiKey}`, + "Content-Type": "application/json" + } + }) + holder.ws = ws + + ws.on("open", () => { + Effect.runSync(Effect.log("WebSocket connected")) + resume(Effect.void) + }) + + ws.on("message", handleMessage) + + ws.on("error", (error) => { + Effect.runSync(Effect.logError(`WebSocket error: ${error.message}`)) + resume(Effect.fail(error as Error)) + }) + + ws.on("close", (code, reason) => { + Effect.runSync(Effect.log(`WebSocket closed: ${code} ${reason.toString()}`)) + Effect.runSync(Queue.shutdown(audioQueue)) + Effect.runSync(Queue.shutdown(transcriptQueue)) + Effect.runSync(Queue.shutdown(userTranscriptQueue)) + Effect.runSync(Queue.shutdown(toolCallQueue)) + Effect.runSync(Queue.shutdown(eventQueue)) + }) + + return Effect.sync(() => { + ws.close() + }) + }) + + const send = (audio: Buffer): Effect.Effect => + Effect.sync(() => { + const ws = holder.ws + if (ws && ws.readyState === WebSocket.OPEN) { + const base64Audio = audio.toString("base64") + const msg = { type: "input_audio_buffer.append", audio: base64Audio } + ws.send(JSON.stringify(msg)) + } + }) + + const sendText = (text: string): Effect.Effect => + Effect.sync(() => { + const ws = holder.ws + if (ws && ws.readyState === WebSocket.OPEN) { + const itemMessage = new ConversationItemCreateMessage({ + item: { + type: "message", + role: "user", + content: [{ type: "input_text", text }] + } + }) + const encodedItem = Schema.encodeSync(ConversationItemCreateMessage)(itemMessage) + ws.send(JSON.stringify({ type: "conversation.item.create", ...encodedItem })) + + const responseMessage = new ResponseCreateMessage({}) + const encodedResponse = Schema.encodeSync(ResponseCreateMessage)(responseMessage) + ws.send(JSON.stringify({ type: "response.create", ...encodedResponse })) + } + }) + + const sendToolResult = (callId: string, result: unknown): Effect.Effect => + Effect.sync(() => { + const ws = holder.ws + if (ws && ws.readyState === WebSocket.OPEN) { + // Send function call output + const outputMsg = { + type: "conversation.item.create", + item: { + type: "function_call_output", + call_id: callId, + output: JSON.stringify(result) + } + } + ws.send(JSON.stringify(outputMsg)) + + // Trigger response generation + const responseMessage = new ResponseCreateMessage({}) + const encodedResponse = Schema.encodeSync(ResponseCreateMessage)(responseMessage) + ws.send(JSON.stringify({ type: "response.create", ...encodedResponse })) + } + }) + + const close = Effect.sync(() => { + const ws = holder.ws + if (ws) { + ws.close() + holder.ws = null + } + }) + + const waitForReady = Queue.take(readyQueue) + + return { + send, + sendText, + sendToolResult, + audioOutput: Stream.fromQueue(audioQueue), + transcripts: Stream.fromQueue(transcriptQueue), + userTranscripts: Stream.fromQueue(userTranscriptQueue), + toolCalls: Stream.fromQueue(toolCallQueue), + events: Stream.fromQueue(eventQueue), + close, + waitForReady + } + }) + }) +}) {} diff --git a/src/voice/domain.ts b/src/voice/domain.ts new file mode 100644 index 0000000..2c650a8 --- /dev/null +++ b/src/voice/domain.ts @@ -0,0 +1,171 @@ +/** + * Voice Domain Types + * + * Types for the Grok realtime voice API integration. + */ +import { Schema } from "effect" + +export const VoiceName = Schema.Literal("ara", "rex", "sal", "eve", "leo") +export type VoiceName = typeof VoiceName.Type + +export const AudioFormat = Schema.Struct({ + type: Schema.Literal("audio/pcm", "audio/pcmu"), + rate: Schema.optional(Schema.Number) +}) +export type AudioFormat = typeof AudioFormat.Type + +export const TurnDetection = Schema.Struct({ + type: Schema.Literal("server_vad") +}) + +export const SessionConfig = Schema.Struct({ + instructions: Schema.optional(Schema.String), + voice: Schema.optional(VoiceName), + audio: Schema.optional(Schema.Struct({ + input: Schema.optional(Schema.Struct({ format: AudioFormat })), + output: Schema.optional(Schema.Struct({ format: AudioFormat })) + })), + turn_detection: Schema.optional(TurnDetection) +}) +export type SessionConfig = typeof SessionConfig.Type + +export class SessionUpdateMessage extends Schema.TaggedClass()("session.update", { + session: SessionConfig +}) {} + +export class InputAudioBufferAppendMessage extends Schema.TaggedClass()( + "input_audio_buffer.append", + { audio: Schema.String } +) {} + +export class InputAudioBufferCommitMessage extends Schema.TaggedClass()( + "input_audio_buffer.commit", + {} +) {} + +export class ResponseCreateMessage extends Schema.TaggedClass()("response.create", {}) {} + +export const ConversationItemContent = Schema.Struct({ + type: Schema.Literal("input_text"), + text: Schema.String +}) + +export class ConversationItemCreateMessage extends Schema.TaggedClass()( + "conversation.item.create", + { + item: Schema.Struct({ + type: Schema.Literal("message"), + role: Schema.Literal("user", "assistant"), + content: Schema.Array(ConversationItemContent) + }) + } +) {} + +export const OutboundMessage = Schema.Union( + SessionUpdateMessage, + InputAudioBufferAppendMessage, + InputAudioBufferCommitMessage, + ResponseCreateMessage, + ConversationItemCreateMessage +) +export type OutboundMessage = typeof OutboundMessage.Type + +export const ConversationCreatedEvent = Schema.Struct({ + type: Schema.Literal("conversation.created"), + conversation: Schema.optional(Schema.Struct({ + id: Schema.optional(Schema.String) + })) +}) +export type ConversationCreatedEvent = typeof ConversationCreatedEvent.Type + +export const SessionUpdatedEvent = Schema.Struct({ + type: Schema.Literal("session.updated"), + session: Schema.optional(Schema.Unknown) +}) +export type SessionUpdatedEvent = typeof SessionUpdatedEvent.Type + +export const ResponseCreatedEvent = Schema.Struct({ + type: Schema.Literal("response.created") +}) +export type ResponseCreatedEvent = typeof ResponseCreatedEvent.Type + +export const ResponseDoneEvent = Schema.Struct({ + type: Schema.Literal("response.done") +}) +export type ResponseDoneEvent = typeof ResponseDoneEvent.Type + +export const ResponseOutputAudioDeltaEvent = Schema.Struct({ + type: Schema.Literal("response.output_audio.delta"), + delta: Schema.String +}) +export type ResponseOutputAudioDeltaEvent = typeof ResponseOutputAudioDeltaEvent.Type + +export const ResponseOutputAudioTranscriptDeltaEvent = Schema.Struct({ + type: Schema.Literal("response.output_audio_transcript.delta"), + delta: Schema.String +}) +export type ResponseOutputAudioTranscriptDeltaEvent = typeof ResponseOutputAudioTranscriptDeltaEvent.Type + +export const InputAudioTranscriptionCompletedEvent = Schema.Struct({ + type: Schema.Literal("conversation.item.input_audio_transcription.completed"), + transcript: Schema.optional(Schema.String) +}) +export type InputAudioTranscriptionCompletedEvent = typeof InputAudioTranscriptionCompletedEvent.Type + +export const InputAudioBufferSpeechStartedEvent = Schema.Struct({ + type: Schema.Literal("input_audio_buffer.speech_started") +}) +export type InputAudioBufferSpeechStartedEvent = typeof InputAudioBufferSpeechStartedEvent.Type + +export const InputAudioBufferSpeechStoppedEvent = Schema.Struct({ + type: Schema.Literal("input_audio_buffer.speech_stopped") +}) +export type InputAudioBufferSpeechStoppedEvent = typeof InputAudioBufferSpeechStoppedEvent.Type + +export const ErrorEvent = Schema.Struct({ + type: Schema.Literal("error"), + error: Schema.optional(Schema.Struct({ + message: Schema.optional(Schema.String), + type: Schema.optional(Schema.String), + code: Schema.optional(Schema.String) + })) +}) +export type ErrorEvent = typeof ErrorEvent.Type + +export const InboundEvent = Schema.Union( + ConversationCreatedEvent, + SessionUpdatedEvent, + ResponseCreatedEvent, + ResponseDoneEvent, + ResponseOutputAudioDeltaEvent, + ResponseOutputAudioTranscriptDeltaEvent, + InputAudioTranscriptionCompletedEvent, + InputAudioBufferSpeechStartedEvent, + InputAudioBufferSpeechStoppedEvent, + ErrorEvent +) +export type InboundEvent = typeof InboundEvent.Type + +export const ToolDefinition = Schema.Struct({ + type: Schema.Literal("function"), + name: Schema.String, + description: Schema.String, + parameters: Schema.Unknown +}) +export type ToolDefinition = typeof ToolDefinition.Type + +export const VoiceSessionConfig = Schema.Struct({ + apiKey: Schema.String, + apiUrl: Schema.optional(Schema.String), + voice: Schema.optional(VoiceName), + sampleRate: Schema.optional(Schema.Number), + instructions: Schema.optional(Schema.String), + tools: Schema.optional(Schema.Array(ToolDefinition)) +}) +export type VoiceSessionConfig = typeof VoiceSessionConfig.Type + +export const DEFAULT_API_URL = "wss://api.x.ai/v1/realtime" +export const DEFAULT_SAMPLE_RATE = 48000 // Highest quality supported +export const DEFAULT_VOICE: VoiceName = "ara" +export const DEFAULT_INSTRUCTIONS = + "You are a helpful voice assistant. Keep your responses conversational and concise since they will be spoken aloud." diff --git a/src/voice/index.ts b/src/voice/index.ts new file mode 100644 index 0000000..92d1aa3 --- /dev/null +++ b/src/voice/index.ts @@ -0,0 +1,11 @@ +/** + * Voice Module + * + * Real-time voice conversation with Grok AI. + */ + +export * from "./audio-capture.ts" +export * from "./audio-playback.ts" +export { voiceCommand } from "./cli.ts" +export * from "./client.ts" +export * from "./domain.ts" diff --git a/unified-demo-http-2026-01-08T13-19-09-396Z.yaml b/unified-demo-http-2026-01-08T13-19-09-396Z.yaml new file mode 100644 index 0000000..37cfd0b --- /dev/null +++ b/unified-demo-http-2026-01-08T13-19-09-396Z.yaml @@ -0,0 +1,222 @@ +meta: + mode: http + provider: openrouter + model: gpt-4o-mini + startTime: 2026-01-08T13:19:07.825Z + endTime: 2026-01-08T13:19:08.089Z + eventCount: 16 +events: + - timestamp: 2026-01-08T13:19:07.825Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: "" + finish_reason: null + - timestamp: 2026-01-08T13:19:07.862Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: Why + finish_reason: null + - timestamp: 2026-01-08T13:19:07.862Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " don't" + finish_reason: null + - timestamp: 2026-01-08T13:19:07.876Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " scientists" + finish_reason: null + - timestamp: 2026-01-08T13:19:07.876Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " trust" + finish_reason: null + - timestamp: 2026-01-08T13:19:07.932Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " atoms" + finish_reason: null + - timestamp: 2026-01-08T13:19:07.932Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: "?" + finish_reason: null + - timestamp: 2026-01-08T13:19:07.991Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: |+ + + + finish_reason: null + - timestamp: 2026-01-08T13:19:07.991Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: Because + finish_reason: null + - timestamp: 2026-01-08T13:19:08.028Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " they" + finish_reason: null + - timestamp: 2026-01-08T13:19:08.028Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " make" + finish_reason: null + - timestamp: 2026-01-08T13:19:08.064Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " up" + finish_reason: null + - timestamp: 2026-01-08T13:19:08.064Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: " everything" + finish_reason: null + - timestamp: 2026-01-08T13:19:08.072Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: "!" + finish_reason: null + - timestamp: 2026-01-08T13:19:08.072Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: "" + finish_reason: stop + - timestamp: 2026-01-08T13:19:08.089Z + direction: recv + message: + id: gen-1767878347-4Be8tXkNhV8HGyhsWD6I + object: chat.completion.chunk + created: 1767878347 + model: openai/gpt-4o-mini + choices: + - index: 0 + delta: + role: assistant + content: "" + finish_reason: null + usage: + prompt_tokens: 24 + completion_tokens: 13 + total_tokens: 37 diff --git a/unified-demo-voice-2026-01-08T13-19-50-260Z.yaml b/unified-demo-voice-2026-01-08T13-19-50-260Z.yaml new file mode 100644 index 0000000..bdf12d2 --- /dev/null +++ b/unified-demo-voice-2026-01-08T13-19-50-260Z.yaml @@ -0,0 +1,784 @@ +meta: + mode: voice + provider: openrouter + model: anthropic/claude-sonnet-4 + startTime: 2026-01-08T13:19:24.534Z + endTime: 2026-01-08T13:19:48.111Z + eventCount: 65 +events: + - timestamp: 2026-01-08T13:19:24.534Z + direction: recv + message: + type: conversation.created + event_id: 57fb6f5e-81bb-4675-8b1f-fceec1c722cd + conversation: + id: cdc3d8f8-8576-4310-916f-bb9135d38faa + object: realtime.conversation + previous_item_id: null + - timestamp: 2026-01-08T13:19:24.534Z + direction: recv + message: + type: ping + event_id: cffff4a6-1c11-46ec-836c-148f8b80e834 + timestamp: 1767878364347 + previous_item_id: null + - timestamp: 2026-01-08T13:19:27.473Z + direction: recv + message: + type: input_audio_buffer.committed + event_id: ad0aec5e-ae77-4225-aaca-a19f0462d165 + item_id: 60a28476-381e-46b7-83fd-5d4d6f2ecef3 + previous_item_id: null + - timestamp: 2026-01-08T13:19:27.546Z + direction: recv + message: + type: conversation.item.added + event_id: d84d76ba-0b28-48c8-a02d-29c3f9f97c49 + item: + id: 60a28476-381e-46b7-83fd-5d4d6f2ecef3 + object: realtime.item + type: message + status: completed + role: user + content: + - type: input_audio + transcript: " Hey there, what's up banana?" + previous_item_id: null + - timestamp: 2026-01-08T13:19:27.599Z + direction: recv + message: + type: conversation.item.input_audio_transcription.completed + event_id: 4b71586d-1251-4c15-8978-e61f905fc848 + item_id: 60a28476-381e-46b7-83fd-5d4d6f2ecef3 + transcript: " Hey there, what's up banana?" + content_index: 0 + status: completed + previous_item_id: null + - timestamp: 2026-01-08T13:19:27.744Z + direction: recv + message: + type: response.output_item.added + event_id: 363ba50e-2b2c-4105-9378-f0aa7356ee0e + item: + id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + object: realtime.item + type: message + status: in_progress + role: assistant + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:27.800Z + direction: recv + message: + type: conversation.item.added + event_id: cff653cc-1c27-48f6-ab65-b287b112387e + item: + id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + object: realtime.item + type: message + status: in_progress + role: assistant + content: [] + previous_item_id: null + - timestamp: 2026-01-08T13:19:27.800Z + direction: recv + message: + type: response.content_part.added + event_id: b037a81b-f179-49a3-a35b-a2eed5a5648c + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + previous_item_id: "0" + part: + type: audio + transcript: "" + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + content_index: 0 + output_index: 0 + - timestamp: 2026-01-08T13:19:28.083Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: 5786439e-029f-48c1-b2e7-c76d5ca15f68 + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: Hey, not much—just chilling. What's up + content_index: 0 + output_index: 0 + start_time: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.254Z + direction: recv + message: + type: response.output_audio.delta + event_id: 19fdc067-750f-43b9-b174-e30b3ba1eeff + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: "" + content_index: 0 + output_index: 0 + rid: a8a048b4ec9411f0866649c67042cd92 + latency: "0.66" + ts: 1767878368024 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.339Z + direction: recv + message: + type: response.output_audio.delta + event_id: 249fdb1e-ad63-4b2c-84f2-72f6ebcbedbf + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: "" + content_index: 0 + output_index: 0 + rid: a8a048b4ec9411f0866649c67042cd92 + ts: 1767878368132 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.425Z + direction: recv + message: + type: response.output_audio.delta + event_id: 009adcc8-8ea9-4fa0-9d97-516ed7b961c6 + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: "" + content_index: 0 + output_index: 0 + rid: a8a048b4ec9411f0866649c67042cd92 + ts: 1767878368284 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.426Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: e332547d-7b3f-472a-893b-3445e734892c + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: " with you, banana?" + content_index: 0 + output_index: 0 + start_time: 2.7 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.694Z + direction: recv + message: + type: response.output_audio.delta + event_id: 669a6b77-3649-45d8-8812-b8b2a86f88cb + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: "" + content_index: 0 + output_index: 0 + rid: a8fd9c6cec9411f0866649c67042cd92 + ts: 1767878368619 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.789Z + direction: recv + message: + type: response.output_audio.delta + event_id: 71984322-c6f7-4813-b070-8230c99710b5 + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: "" + content_index: 0 + output_index: 0 + rid: a8fd9c6cec9411f0866649c67042cd92 + ts: 1767878368724 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.797Z + direction: recv + message: + type: response.output_audio.delta + event_id: 44c8af9e-eea4-4610-a1c2-f0ed371263ba + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + delta: "" + content_index: 0 + output_index: 0 + rid: a8fd9c6cec9411f0866649c67042cd92 + ts: 1767878368724 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.798Z + direction: recv + message: + type: response.output_audio_transcript.done + event_id: 98dd0418-35b6-4c38-86d9-0d67114ac8d0 + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + transcript: Hey, not much—just chilling. What's up with you, banana? + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.798Z + direction: recv + message: + type: response.content_part.done + event_id: ec742e3b-1913-4c70-b64a-d6bb9e60176d + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.798Z + direction: recv + message: + type: response.output_audio.done + event_id: c602ae89-909f-432d-88d2-4e309ff5f89e + item_id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:28.798Z + direction: recv + message: + type: response.output_item.done + event_id: b466a75a-c0d8-4c4f-bec1-155557aabd06 + item: + id: d9b6c456-1e4f-4ed9-9bff-7f6ed94b9af5 + object: realtime.item + type: message + status: completed + role: assistant + response_id: 99c2ba29-f1f0-4eb8-ae29-b3d309537571 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.276Z + direction: recv + message: + type: input_audio_buffer.committed + event_id: 03d70ddd-beca-4be5-a34b-64bfd9046de9 + item_id: 8978c5d9-0120-4e06-b187-7f9696d44ab5 + previous_item_id: 60a28476-381e-46b7-83fd-5d4d6f2ecef3 + - timestamp: 2026-01-08T13:19:34.370Z + direction: recv + message: + type: conversation.item.added + event_id: 4aab2865-aef5-4340-8d82-7ec080db5a8c + item: + id: 8978c5d9-0120-4e06-b187-7f9696d44ab5 + object: realtime.item + type: message + status: completed + role: user + content: + - type: input_audio + transcript: " Not much. See you later." + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.383Z + direction: recv + message: + type: conversation.item.input_audio_transcription.completed + event_id: 9a1a01aa-7fa4-4391-b1aa-7404a9b5f836 + item_id: 8978c5d9-0120-4e06-b187-7f9696d44ab5 + transcript: " Not much. See you later." + content_index: 0 + status: completed + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.403Z + direction: recv + message: + type: ping + event_id: fe0fff49-fe88-4d08-8ea5-44bb3d12f0df + timestamp: 1767878374346 + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.544Z + direction: recv + message: + type: response.output_item.added + event_id: c398549f-6a1c-4972-9ff5-92442c079cd7 + item: + id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + object: realtime.item + type: message + status: in_progress + role: assistant + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.558Z + direction: recv + message: + type: conversation.item.added + event_id: 7b12755c-ca8c-48fb-8597-9c343fa50911 + item: + id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + object: realtime.item + type: message + status: in_progress + role: assistant + content: [] + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.558Z + direction: recv + message: + type: response.content_part.added + event_id: 0cec50b8-9614-43d5-8926-9064bb8d9cd0 + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + previous_item_id: "0" + part: + type: audio + transcript: "" + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + content_index: 0 + output_index: 0 + - timestamp: 2026-01-08T13:19:34.865Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: 966fdf5f-b673-40f7-ac58-93736865c758 + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + delta: See you later! Take care. + content_index: 0 + output_index: 0 + start_time: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.880Z + direction: recv + message: + type: response.output_audio.delta + event_id: 6958121d-2fbd-4075-aad2-c37333d13b88 + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + delta: "" + content_index: 0 + output_index: 0 + rid: acae01c6ec9411f0866649c67042cd92 + latency: "0.60" + ts: 1767878374808 + previous_item_id: null + - timestamp: 2026-01-08T13:19:34.982Z + direction: recv + message: + type: response.output_audio.delta + event_id: 5e095cf3-8fd9-49de-9791-09eb841a2d2c + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + delta: "" + content_index: 0 + output_index: 0 + rid: acae01c6ec9411f0866649c67042cd92 + ts: 1767878374919 + previous_item_id: null + - timestamp: 2026-01-08T13:19:35.163Z + direction: recv + message: + type: response.output_audio.delta + event_id: 24d001c9-d3d4-4e5c-9676-1014e82bcdc2 + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + delta: "" + content_index: 0 + output_index: 0 + rid: acae01c6ec9411f0866649c67042cd92 + ts: 1767878375087 + previous_item_id: null + - timestamp: 2026-01-08T13:19:35.164Z + direction: recv + message: + type: response.output_audio.delta + event_id: 390b3a50-314b-43e8-b5a5-32d69bd0b12d + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + delta: "" + content_index: 0 + output_index: 0 + rid: acae01c6ec9411f0866649c67042cd92 + ts: 1767878375088 + previous_item_id: null + - timestamp: 2026-01-08T13:19:35.164Z + direction: recv + message: + type: response.output_audio_transcript.done + event_id: 9da16a15-f652-4f0c-aa87-006f2fbec980 + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + transcript: See you later! Take care. + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:35.164Z + direction: recv + message: + type: response.content_part.done + event_id: 7f1627e8-5f4d-455c-a409-3e3bd519ac79 + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:35.164Z + direction: recv + message: + type: response.output_audio.done + event_id: c32e7148-f4de-4d3d-8239-168fd259a594 + item_id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:35.164Z + direction: recv + message: + type: response.output_item.done + event_id: da730c47-1f66-4e07-ba59-7091a8e7b2d8 + item: + id: 4883cdad-291c-4ec8-adaf-e5b42bfdec61 + object: realtime.item + type: message + status: completed + role: assistant + response_id: 687c240c-58ff-4141-ab52-5163aca06d99 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:44.397Z + direction: recv + message: + type: ping + event_id: 142b36d2-ef15-4087-9c65-c0ccece0bf23 + timestamp: 1767878384347 + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.171Z + direction: recv + message: + type: input_audio_buffer.committed + event_id: 94a93fde-a0cb-453e-8c5b-73ff3dc00f27 + item_id: e1c86a59-a881-4b62-9089-5556766a5486 + previous_item_id: 8978c5d9-0120-4e06-b187-7f9696d44ab5 + - timestamp: 2026-01-08T13:19:45.281Z + direction: recv + message: + type: conversation.item.added + event_id: 8d1b18b4-ff61-487c-b05d-2b9574b37587 + item: + id: e1c86a59-a881-4b62-9089-5556766a5486 + object: realtime.item + type: message + status: completed + role: user + content: + - type: input_audio + transcript: " Can you clean up the, um, output of this, um..." + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.322Z + direction: recv + message: + type: conversation.item.input_audio_transcription.completed + event_id: a13ddbcf-508e-4be3-aad5-868c5ea44d55 + item_id: e1c86a59-a881-4b62-9089-5556766a5486 + transcript: " Can you clean up the, um, output of this, um..." + content_index: 0 + status: completed + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.467Z + direction: recv + message: + type: response.output_item.added + event_id: 388ad935-7428-4d40-8721-fccf313cfcd7 + item: + id: 1bd8ddf3-fc2b-4352-b131-d97cd1e9d9a1 + object: realtime.item + type: message + status: in_progress + role: assistant + response_id: 3a3a9354-8635-45e5-9a81-e45c435fbb52 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.479Z + direction: recv + message: + type: conversation.item.added + event_id: 10c2d901-7a1f-4c08-9925-b7ec023000b7 + item: + id: 1bd8ddf3-fc2b-4352-b131-d97cd1e9d9a1 + object: realtime.item + type: message + status: in_progress + role: assistant + content: [] + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.479Z + direction: recv + message: + type: response.content_part.added + event_id: 8d8be5b3-6a23-46ae-91fa-02c0b8e750fe + item_id: 1bd8ddf3-fc2b-4352-b131-d97cd1e9d9a1 + previous_item_id: "0" + part: + type: audio + transcript: "" + response_id: 3a3a9354-8635-45e5-9a81-e45c435fbb52 + content_index: 0 + output_index: 0 + - timestamp: 2026-01-08T13:19:45.805Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: 990887f6-f9db-4a24-a829-85307ba8a487 + item_id: 1bd8ddf3-fc2b-4352-b131-d97cd1e9d9a1 + response_id: 3a3a9354-8635-45e5-9a81-e45c435fbb52 + delta: Sure, I'd be happy to clean that + content_index: 0 + output_index: 0 + start_time: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.809Z + direction: recv + message: + type: response.output_audio.delta + event_id: 98242298-aa53-49aa-9f08-6963eef3c771 + item_id: 1bd8ddf3-fc2b-4352-b131-d97cd1e9d9a1 + response_id: 3a3a9354-8635-45e5-9a81-e45c435fbb52 + delta: "" + content_index: 0 + output_index: 0 + rid: b3318ae0ec9411f0866649c67042cd92 + latency: "0.63" + ts: 1767878385743 + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.866Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: 51461f5b-798d-4273-b46c-10dfadf019fc + item_id: 1bd8ddf3-fc2b-4352-b131-d97cd1e9d9a1 + response_id: 3a3a9354-8635-45e5-9a81-e45c435fbb52 + delta: " up for you. What exactly do you need help with?" + content_index: 0 + output_index: 0 + start_time: 1.6 + previous_item_id: null + - timestamp: 2026-01-08T13:19:45.906Z + direction: recv + message: + type: response.output_audio.delta + event_id: 929dd1a7-83fd-4a30-9169-e34e3fa0ed91 + item_id: 1bd8ddf3-fc2b-4352-b131-d97cd1e9d9a1 + response_id: 3a3a9354-8635-45e5-9a81-e45c435fbb52 + delta: "" + content_index: 0 + output_index: 0 + rid: b3318ae0ec9411f0866649c67042cd92 + ts: 1767878385848 + previous_item_id: null + - timestamp: 2026-01-08T13:19:46.688Z + direction: recv + message: + type: input_audio_buffer.committed + event_id: be9200f7-e28b-4705-8e6b-b9e85e314702 + item_id: 1b63139c-9315-4e94-a135-66f82de8d3d0 + previous_item_id: e1c86a59-a881-4b62-9089-5556766a5486 + - timestamp: 2026-01-08T13:19:46.732Z + direction: recv + message: + type: conversation.item.added + event_id: 51acd18f-12b8-415f-83ca-303789aac3e5 + item: + id: 1b63139c-9315-4e94-a135-66f82de8d3d0 + object: realtime.item + type: message + status: completed + role: user + content: + - type: input_audio + transcript: " stuff here." + previous_item_id: null + - timestamp: 2026-01-08T13:19:46.733Z + direction: recv + message: + type: conversation.item.input_audio_transcription.completed + event_id: 175c88ef-8496-49ff-bbbd-1cb6560680f2 + item_id: 1b63139c-9315-4e94-a135-66f82de8d3d0 + transcript: " stuff here." + content_index: 0 + status: completed + previous_item_id: null + - timestamp: 2026-01-08T13:19:46.896Z + direction: recv + message: + type: response.output_item.added + event_id: 9e8a3a97-aa96-4746-bf4b-781b9580d799 + item: + id: 494247b4-88df-45ae-9933-6311a84fda18 + object: realtime.item + type: message + status: in_progress + role: assistant + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:46.909Z + direction: recv + message: + type: conversation.item.added + event_id: 1177bd74-5566-44dd-b8eb-702c3512dfcc + item: + id: 494247b4-88df-45ae-9933-6311a84fda18 + object: realtime.item + type: message + status: in_progress + role: assistant + content: [] + previous_item_id: null + - timestamp: 2026-01-08T13:19:46.910Z + direction: recv + message: + type: response.content_part.added + event_id: b42a71d4-a982-47a3-87ca-f1fb97f93b36 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + previous_item_id: "0" + part: + type: audio + transcript: "" + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + content_index: 0 + output_index: 0 + - timestamp: 2026-01-08T13:19:47.212Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: 847d7bf6-5073-4b92-8a4e-629a86a69ce0 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: Sure, I'd be happy to clean up + content_index: 0 + output_index: 0 + start_time: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:47.226Z + direction: recv + message: + type: response.output_audio.delta + event_id: 0e30e4af-4c78-4b16-9685-592b6fe04cfc + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: "" + content_index: 0 + output_index: 0 + rid: b40b95d2ec9411f0866649c67042cd92 + latency: "0.59" + ts: 1767878387163 + previous_item_id: null + - timestamp: 2026-01-08T13:19:47.303Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: 35838198-6249-4b17-8a87-a6d82aeb81b0 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: " that output for you. What exactly do you have?" + content_index: 0 + output_index: 0 + start_time: 1.76 + previous_item_id: null + - timestamp: 2026-01-08T13:19:47.340Z + direction: recv + message: + type: response.output_audio.delta + event_id: 5b2235e2-f20a-41de-bba0-4c2f302a5b82 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: "" + content_index: 0 + output_index: 0 + rid: b40b95d2ec9411f0866649c67042cd92 + ts: 1767878387278 + previous_item_id: null + - timestamp: 2026-01-08T13:19:47.440Z + direction: recv + message: + type: response.output_audio.delta + event_id: 840f9233-7fab-4aa6-b608-c4d39a2db8aa + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: "" + content_index: 0 + output_index: 0 + rid: b40b95d2ec9411f0866649c67042cd92 + ts: 1767878387379 + previous_item_id: null + - timestamp: 2026-01-08T13:19:47.721Z + direction: recv + message: + type: response.output_audio.delta + event_id: cebb1785-6683-480c-a293-603a1cf389e3 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: "" + content_index: 0 + output_index: 0 + rid: b44a3eeaec9411f0866649c67042cd92 + ts: 1767878387652 + previous_item_id: null + - timestamp: 2026-01-08T13:19:48.110Z + direction: recv + message: + type: response.output_audio.delta + event_id: 740e2a38-bdd2-465a-9bc5-5b31d388b4fb + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: "" + content_index: 0 + output_index: 0 + rid: b44a3eeaec9411f0866649c67042cd92 + ts: 1767878387967 + previous_item_id: null + - timestamp: 2026-01-08T13:19:48.111Z + direction: recv + message: + type: response.output_audio.delta + event_id: d317e43a-8735-441a-9d91-9b2bb29ffd91 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + delta: "" + content_index: 0 + output_index: 0 + rid: b44a3eeaec9411f0866649c67042cd92 + ts: 1767878387967 + previous_item_id: null + - timestamp: 2026-01-08T13:19:48.111Z + direction: recv + message: + type: response.output_audio_transcript.done + event_id: 4bec8645-c237-4355-9a7a-7c71b14ffc71 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + transcript: Sure, I'd be happy to clean up that output for you. What exactly do + you have? + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:48.111Z + direction: recv + message: + type: response.content_part.done + event_id: 56bd5745-0aad-4b8e-8efa-130586b3e8c0 + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:48.111Z + direction: recv + message: + type: response.output_audio.done + event_id: de5d02bb-4300-4b96-a891-1a1d398cb36a + item_id: 494247b4-88df-45ae-9933-6311a84fda18 + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:19:48.111Z + direction: recv + message: + type: response.output_item.done + event_id: 43ef7d14-5a76-4716-b8aa-e5703dec79a5 + item: + id: 494247b4-88df-45ae-9933-6311a84fda18 + object: realtime.item + type: message + status: completed + role: assistant + response_id: 0561e0ed-f2d0-4af8-8aeb-b78347e262ec + output_index: 0 + previous_item_id: null diff --git a/unified-demo-voice-2026-01-08T13-21-13-135Z.yaml b/unified-demo-voice-2026-01-08T13-21-13-135Z.yaml new file mode 100644 index 0000000..03a1449 --- /dev/null +++ b/unified-demo-voice-2026-01-08T13-21-13-135Z.yaml @@ -0,0 +1,244 @@ +meta: + mode: voice + provider: openrouter + model: anthropic/claude-sonnet-4 + startTime: 2026-01-08T13:21:05.887Z + endTime: 2026-01-08T13:21:10.171Z + eventCount: 20 +events: + - timestamp: 2026-01-08T13:21:05.887Z + direction: recv + message: + type: conversation.created + event_id: e1acf194-5306-4e16-ba1b-043421f39d4d + conversation: + id: f8adad66-f8ea-411f-b392-e3114e8a08ce + object: realtime.conversation + previous_item_id: null + - timestamp: 2026-01-08T13:21:05.887Z + direction: recv + message: + type: ping + event_id: fdeed758-3e2e-4aaa-b970-4a40ec924d18 + timestamp: 1767878465728 + previous_item_id: null + - timestamp: 2026-01-08T13:21:08.639Z + direction: recv + message: + type: input_audio_buffer.committed + event_id: 0285f500-c16e-454a-bc5a-e1cf57784a84 + item_id: 0fa6e850-f94c-48b7-a73c-2abd16f7e3b4 + previous_item_id: null + - timestamp: 2026-01-08T13:21:08.770Z + direction: recv + message: + type: conversation.item.added + event_id: f7673877-9357-4738-8b4e-d7ba06e9bc40 + item: + id: 0fa6e850-f94c-48b7-a73c-2abd16f7e3b4 + object: realtime.item + type: message + status: completed + role: user + content: + - type: input_audio + transcript: " Hi there, what's going on?" + previous_item_id: null + - timestamp: 2026-01-08T13:21:08.831Z + direction: recv + message: + type: conversation.item.input_audio_transcription.completed + event_id: 53fd72f3-4a1e-47f2-afcf-dfec27ca905f + item_id: 0fa6e850-f94c-48b7-a73c-2abd16f7e3b4 + transcript: " Hi there, what's going on?" + content_index: 0 + status: completed + previous_item_id: null + - timestamp: 2026-01-08T13:21:08.963Z + direction: recv + message: + type: response.output_item.added + event_id: 5e6adbf0-5da0-4fe4-a1b3-77ec02694fcb + item: + id: 98d3546b-0740-4b41-81d1-d48366c552bb + object: realtime.item + type: message + status: in_progress + role: assistant + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:21:08.983Z + direction: recv + message: + type: conversation.item.added + event_id: 867260bc-269e-44db-a5c3-b93ed8cf053f + item: + id: 98d3546b-0740-4b41-81d1-d48366c552bb + object: realtime.item + type: message + status: in_progress + role: assistant + content: [] + previous_item_id: null + - timestamp: 2026-01-08T13:21:08.983Z + direction: recv + message: + type: response.content_part.added + event_id: 716ebb45-3f61-4c80-a3b5-ee900585b5d0 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + previous_item_id: "0" + part: + type: audio + transcript: "" + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + content_index: 0 + output_index: 0 + - timestamp: 2026-01-08T13:21:09.299Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: fd4d92ce-bfbc-45b2-9168-dabc146474b0 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: Hey, not much—just here and ready + content_index: 0 + output_index: 0 + start_time: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:21:09.495Z + direction: recv + message: + type: response.output_audio.delta + event_id: 22adc87f-2d13-4a3a-af85-587b7e6808fb + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: "" + content_index: 0 + output_index: 0 + rid: e4f84884ec9411f087bdf35f9e0d44a5 + latency: "0.67" + ts: 1767878469264 + previous_item_id: null + - timestamp: 2026-01-08T13:21:09.591Z + direction: recv + message: + type: response.output_audio.delta + event_id: 6bbf154f-edef-41cd-826f-1a2631fe2605 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: "" + content_index: 0 + output_index: 0 + rid: e4f84884ec9411f087bdf35f9e0d44a5 + ts: 1767878469372 + previous_item_id: null + - timestamp: 2026-01-08T13:21:09.640Z + direction: recv + message: + type: response.output_audio.delta + event_id: 87acbfbe-a69c-4f11-aa04-8fd8176a75d1 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: "" + content_index: 0 + output_index: 0 + rid: e4f84884ec9411f087bdf35f9e0d44a5 + ts: 1767878469524 + previous_item_id: null + - timestamp: 2026-01-08T13:21:09.641Z + direction: recv + message: + type: response.output_audio_transcript.delta + event_id: b83c68da-172e-41d3-84e8-dd1ffcd0cce8 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: " to help. What's on your mind?" + content_index: 0 + output_index: 0 + start_time: 2.92 + previous_item_id: null + - timestamp: 2026-01-08T13:21:09.912Z + direction: recv + message: + type: response.output_audio.delta + event_id: e2b78388-6d94-466f-9101-ea04cf6816f3 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: "" + content_index: 0 + output_index: 0 + rid: e55c37e0ec9411f087bdf35f9e0d44a5 + ts: 1767878469861 + previous_item_id: null + - timestamp: 2026-01-08T13:21:10.170Z + direction: recv + message: + type: response.output_audio.delta + event_id: 5bf626bd-d708-48e0-90c9-c8201a1fe9b8 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: "" + content_index: 0 + output_index: 0 + rid: e55c37e0ec9411f087bdf35f9e0d44a5 + ts: 1767878470095 + previous_item_id: null + - timestamp: 2026-01-08T13:21:10.170Z + direction: recv + message: + type: response.output_audio.delta + event_id: f285b2d2-bf46-46a8-ac0b-47304c244e95 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + delta: "" + content_index: 0 + output_index: 0 + rid: e55c37e0ec9411f087bdf35f9e0d44a5 + ts: 1767878470096 + previous_item_id: null + - timestamp: 2026-01-08T13:21:10.170Z + direction: recv + message: + type: response.output_audio_transcript.done + event_id: 686672cf-2fe5-40ea-b4bd-6c7af1d0f57e + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + transcript: Hey, not much—just here and ready to help. What's on your mind? + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:21:10.170Z + direction: recv + message: + type: response.content_part.done + event_id: 2173d62b-39ce-4cfe-ab9a-5b902f48a8a2 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:21:10.171Z + direction: recv + message: + type: response.output_audio.done + event_id: 0a8e6b77-b56b-447d-adbc-11f64aeeca41 + item_id: 98d3546b-0740-4b41-81d1-d48366c552bb + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + content_index: 0 + output_index: 0 + previous_item_id: null + - timestamp: 2026-01-08T13:21:10.171Z + direction: recv + message: + type: response.output_item.done + event_id: 6f58c749-06c0-4b1b-8228-30e0c49057d5 + item: + id: 98d3546b-0740-4b41-81d1-d48366c552bb + object: realtime.item + type: message + status: completed + role: assistant + response_id: 54bb3804-2fb3-44ca-9486-35d87f8024c5 + output_index: 0 + previous_item_id: null