Skip to content

Conversation

@Roei-Bracha
Copy link
Contributor

Conversation Recorder Extension

This PR adds the conversation_recorder extension, which allows for recording both user and agent audio during a session, mixing them into a single WAV file, and saving them to various storage backends.

Changes Overview

The PR introduces a new extension located at ai_agents/agents/ten_packages/extension/conversation_recorder/. Below is a breakdown of the included files:

File Description
extension.py Core extension logic. Manages the recording lifecycle, listens for on_user_joined/on_user_left events, and coordinates audio mixing and storage.
audio_mixer.py Handles mixing multiple PCM audio streams into one. Includes support for real-time resampling (e.g., matching 16kHz user audio with 24kHz agent audio).
storage.py Provides an abstraction for storage backends. Supports Local Filesystem, Google Cloud Storage (GCS), and S3-compatible storage.
README.md Comprehensive documentation on configuration, features, and graph integration.
manifest.json Extension metadata and definition of all configuration properties.
requirements.txt Lists dependencies: numpy, scipy, google-cloud-storage, and boto3.
addon.py / __init__.py Necessary boilerplate for TEN Agent extension registration and loading.
property.json Default values for the extension's properties.

Key Features

  • Multi-Source Audio Mixing: Successfully combines user and agent audio frames into a single high-quality WAV recording.
  • Graceful Shutdown: Implements signal handlers (SIGTERM, SIGINT) and atexit hooks to ensure recordings are correctly flushed and closed even if the agent process is interrupted.
  • Pluggable Storage: Native support for uploading recordings directly to GCS or S3 buckets, or saving them locally.
  • Event-Driven Recording: Can be configured to start automatically on agent startup or be triggered by specific commands like on_user_joined.

How to use

Integrate the extension into your graph by:

  1. Connecting the pcm_frame outputs from audio sources (e.g., streamid_adapter and v2v) to the conversation_recorder node.
  2. Connecting the on_user_joined and on_user_left commands from your RTC extension to the recorder.
  3. Configuring the storage_type and relevant credentials in the graph properties.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants