Skip to content
Merged
201 changes: 172 additions & 29 deletions server/utilities/audio/aic-filter.mdx
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
---
title: "AICFilter"
description: "Speech improvement using ai-coustics"
description: "Speech enhancement using ai-coustics' SDK"
---

## Overview

`AICFilter` is an audio processor that improves users speech by reducing background noise and improving speech clarity overall. It inherits from `BaseAudioFilter` and processes audio frames to improve audio quality.
`AICFilter` is an audio processor that enhances user speech by reducing background noise and improving speech clarity. It inherits from `BaseAudioFilter` and processes audio frames in real-time using ai-coustics' speech enhancement technology.

To use AIC, you need a license key. Get started at [ai-coustics.com](https://ai-coustics.com/pipecat).

<Note>
This documentation covers **aic-sdk v2.x**. If you're using aic-sdk v1.x, please see the [Migration Guide](#migration-guide-v1-to-v2) section below for upgrading instructions.
</Note>

## Installation

The AIC filter requires additional dependencies:
Expand All @@ -19,26 +23,68 @@ pip install "pipecat-ai[aic]"

## Constructor Parameters

<ParamField path="license_key" type="str" default="">
AIC license key
<ParamField path="license_key" type="str" required>
ai-coustics license key for authentication. Get your key at [developers.ai-coustics.io](https://developers.ai-coustics.io).
</ParamField>

<ParamField path="model_type" type="int" default="0">
Model
<ParamField path="model_id" type="Optional[str]" default="None">
Model identifier to download from CDN. Required if `model_path` is not provided.
See [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/) for available models.
See the [documentation](https://docs.ai-coustics.com/guides/models) for more detailed information about the models.

Examples: `"quail-vf-l-16khz"`, `"quail-s-16khz"`, `"quail-l-8khz"`
</ParamField>

<ParamField path="enhancement_level" type="float" default="1.0">
Enhancement level
<ParamField path="model_path" type="Optional[str]" default="None">
Path to a local `.aicmodel` file. If provided, `model_id` is ignored and no download occurs.
Useful for offline deployments or custom models.
</ParamField>

<ParamField path="voice_gain" type="float" default="1.0">
Voice gain
<ParamField path="model_download_dir" type="Optional[Path]" default="None">
Directory for downloading and caching models. Defaults to a cache directory in the user's home folder.
</ParamField>

<ParamField path="noise_gate_enable" type="bool" default="True">
Enable noise gate
## Methods

### create_vad_analyzer

Creates an `AICVADAnalyzer` that uses the AIC model's built-in voice activity detection.

```python
def create_vad_analyzer(
*,
speech_hold_duration: Optional[float] = None,
minimum_speech_duration: Optional[float] = None,
sensitivity: Optional[float] = None,
) -> AICVADAnalyzer
```

#### VAD Parameters
<ParamField path="speech_hold_duration" type="Optional[float]" default="None">
Controls for how long the VAD continues to detect speech after the audio signal no longer contains speech (in seconds).
Range: `0.0` to `100x model window length`, Default (in SDK): `0.05s`
</ParamField>

<ParamField path="minimum_speech_duration" type="Optional[float]" default="None">
Controls for how long speech needs to be present in the audio signal before the VAD considers it speech (in seconds).
Range: `0.0` to `1.0`, Default (in SDK): `0.0s`
</ParamField>

<ParamField path="sensitivity" type="Optional[float]" default="None">
Controls the sensitivity (energy threshold) of the VAD. This value is used by the VAD as the threshold a speech audio signal's energy has to exceed in order to be considered speech.
Formula: `Energy threshold = 10 ** (-sensitivity)`
Range: `1.0` to `15.0`, Default (in SDK): `6.0`
</ParamField>

### get_vad_context

Returns the VAD context once the processor is initialized. Can be used to dynamically adjust VAD parameters at runtime.

```python
vad_ctx = aic_filter.get_vad_context()
vad_ctx.set_parameter(VadParameter.Sensitivity, 8.0)
```

## Input Frames

<ParamField path="FilterEnableFrame" type="Frame">
Expand All @@ -47,54 +93,151 @@ pip install "pipecat-ai[aic]"
```python
from pipecat.frames.frames import FilterEnableFrame

# Disable noise reduction
# Disable speech enhancement
await task.queue_frame(FilterEnableFrame(False))

# Re-enable noise reduction
# Re-enable speech enhancement
await task.queue_frame(FilterEnableFrame(True))
```

</ParamField>

## Usage Example
## Usage Examples

### Basic Usage with AIC VAD

The recommended approach is to use `AICFilter` with its built-in VAD analyzer:

```python
from pipecat.audio.filters.aic_filter import AICFilter
from pipecat.transports.services.daily import DailyTransport, DailyParams

# Create the AIC filter
aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-vf-l-16khz",
)

# Use AIC's integrated VAD
transport = DailyTransport(
room_url,
token,
"Respond bot",
"Bot",
DailyParams(
audio_in_filter=AICFilter(), # Enable AIC speech improvement
audio_in_enabled=True,
audio_out_enabled=True,
vad_analyzer=SileroVADAnalyzer(),
audio_in_filter=aic_filter,
vad_analyzer=aic_filter.create_vad_analyzer(
speech_hold_duration=0.05,
minimum_speech_duration=0.0,
sensitivity=6.0,
),
),
)
```

### Using a Local Model

For offline deployments or when you want to manage model files yourself:

```python
from pipecat.audio.filters.aic_filter import AICFilter

aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_path="/path/to/your/model.aicmodel",
)
```

### Custom Cache Directory

Specify a custom directory for model downloads:

```python
from pipecat.audio.filters.aic_filter import AICFilter

aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-s-16khz",
model_download_dir="/opt/aic-models",
)
```

### With Other Transports

The AIC filter works with any Pipecat transport:

```python
from pipecat.audio.filters.aic_filter import AICFilter
from pipecat.transports.websocket import FastAPIWebsocketTransport, FastAPIWebsocketParams

aic_filter = AICFilter(
license_key=os.environ["AIC_SDK_LICENSE"],
model_id="quail-vf-l-16khz",
)

transport = FastAPIWebsocketTransport(
params=FastAPIWebsocketParams(
audio_in_enabled=True,
audio_out_enabled=True,
audio_in_filter=aic_filter,
vad_analyzer=aic_filter.create_vad_analyzer(
speech_hold_duration=0.05,
sensitivity=6.0,
),
),
)
```

<Info>
See the [AIC filter
example](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07zd-interruptible-aicoustics.py)
for a complete example.
See the [AIC filter example](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07zd-interruptible-aicoustics.py) for a complete working example.
</Info>

## Models

For detailed information about the available models, take a look at the [Models documentation](https://docs.ai-coustics.com/guides/models).

## Audio Flow

```mermaid
graph TD
A[AudioRawFrame] --> B[AICFilter]
B[AICFilter] --> C[VAD]
C[VAD] --> D[STT]
B --> C[AICVADAnalyzer]
C --> D[STT]
```

The AIC filter enhances audio before it reaches the VAD and STT stages, improving transcription accuracy in noisy environments.

## Migration Guide (v1 to v2)

<Note>
For the complete aic-sdk migration guide including all API changes, see the official [Python 1.3 to 2.0 Migration Guide](https://docs.ai-coustics.com/guides/migrations/python-1-3-to-2-0#quick-migration-checklist).
</Note>

### Migration Steps

1. Update Pipecat to the latest version (aic-sdk v2.x is included automatically).
2. Remove deprecated constructor parameters (`model_type`, `enhancement_level`, `voice_gain`, `noise_gate_enable`).
3. Add `model_id` parameter with an appropriate model (e.g., `"quail-vf-l-16khz"`).
4. Update any runtime VAD adjustments to use the new VAD context API.
5. We recommend to use `aic_filter.create_vad_analyzer()` for improved accuracy.

### Breaking Changes

| v1 Parameter | v2 Replacement |
|--------------|----------------|
| `model_type` | `model_id` (string-based model selection) |
| `enhancement_level` | Removed (model-specific behavior) |
| `voice_gain` | Removed |
| `noise_gate_enable` | Removed |

## Notes

- Requires ai-coustics license key
- Supports real-time audio processing
- Handles PCM_16 audio format
- Requires ai-coustics license key (get one at [developers.ai-coustics.io](https://developers.ai-coustics.io))
- Models are automatically downloaded and cached on first use
- Supports real-time audio processing with low latency
- Handles PCM_16 audio format (int16 samples)
- Thread-safe for pipeline processing
- Can be dynamically enabled/disabled
- Maintains audio quality while improving speech, including noise reduction
- Efficient processing for low latency
- Can be dynamically enabled/disabled via `FilterEnableFrame`
- Integrated VAD provides better accuracy than standalone VAD when using enhancement
- For available models, visit [artifacts.ai-coustics.io](https://artifacts.ai-coustics.io/)