Realtime Voice AI Agent

A web application that enables real-time voice conversations with an AI agent using Azure's Realtime API (Microsoft Foundry). The agent can use MCP tools to access Microsoft Learn documentation, supports interruptions, and maintains natural conversation flow.

Features

Web-based Interface: Accessible via web browser, no installation required
Real-time Voice: Bidirectional audio streaming via WebSocket
Interruptions: User can interrupt agent mid-speech
MCP Tool Integration: Agent can access Microsoft Learn documentation via native Azure Realtime API MCP support
Natural Flow: Conversation continues until explicit goodbye
Comprehensive Logging: All interactions logged with timestamps
Browser Audio: Uses Web Audio API for microphone capture and audio playback
Audio Visualization: Real-time visual feedback for both user and agent speech
Error Handling: Graceful error handling and recovery
Session Management: Support for multiple concurrent sessions

Prerequisites

Python 3.9 or higher
Azure OpenAI account with Realtime API access
Azure OpenAI resource with gpt-realtime model deployed

Quick Start

Clone the repository

git clone <repository-url>
cd realtime-speech

Create virtual environment

python -m venv venv
# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Configure environment

copy .env.example .env
# Edit .env with your credentials

Verify environment
```
python verify_env.py
```
Start the server
```
python app.py
```
Access the web interface
- Open browser to http://localhost:8000
- Grant microphone permissions when prompted
- Click "Start Conversation" button

Detailed Setup Instructions

1. Prerequisites

Python Installation

Ensure Python 3.9+ is installed
Verify with: python --version

Azure OpenAI Setup

Create an Azure OpenAI resource in the Azure portal
Deploy the gpt-realtime model in your resource
Get your endpoint URL and API key from the Azure portal
Note your deployment name (typically gpt-realtime)

MCP Server (Optional)

The Microsoft Learn MCP server is pre-configured and requires no setup. It provides free access to official Microsoft documentation.

To use a different MCP server, update these settings in .env:

MCP_SERVER_URL: URL of your MCP server
MCP_SERVER_LABEL: Label for the MCP server
MCP_REQUIRE_APPROVAL: Approval mode (never, always, or auto)

The agent has access to the following tools via the Microsoft Learn MCP Server:

Tool	Description
`microsoft_docs_search`	Semantic search against Microsoft official technical documentation
`microsoft_docs_fetch`	Fetch and convert a Microsoft documentation page into markdown
`microsoft_code_sample_search`	Search for official Microsoft/Azure code snippets and examples

2. Installation Steps

Step 1: Clone/Download Repository

git clone <repository-url>
cd realtime-speech

Step 2: Create Virtual Environment

python -m venv venv

Windows:

venv\Scripts\activate

Linux/Mac:

source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Configure Environment

Copy the example environment file:

copy .env.example .env
# Linux/Mac: cp .env.example .env

Edit .env and fill in your values:
- AZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpoint URL
- AZURE_OPENAI_API_KEY: Your Azure OpenAI API key
- AZURE_OPENAI_DEPLOYMENT_NAME: Your deployment name (usually gpt-realtime)
- AZURE_OPENAI_API_VERSION: API version (default: 2024-05-01-preview)
- NEWSAPI_API_KEY: Your NewsAPI key (optional)
- MCP_SERVER_PATH: Path to NewsAPI MCP server directory

Step 5: Verify Environment

python verify_env.py

This will check:

✓ .env file exists
✓ All required environment variables are set
✓ MCP server path is accessible
✓ Optional: Test Azure API connection (with --test-api)
✓ Optional: Test MCP server (with --test-mcp)

3. Running the Application

Start the Server

Option 1: Using the app.py script

python app.py

Option 2: Using uvicorn directly

uvicorn app:app --host 0.0.0.0 --port 8000

Access the Web Interface

Open your browser to http://localhost:8000
You should see the conversation interface
Click "Start Conversation" to begin
Grant microphone permissions when prompted

Using the Application

Start Conversation: Click the "Start Conversation" button
Stop Conversation: Click "Stop Conversation" to end the session
Interrupt Agent: Simply start speaking while the agent is talking
End Conversation: Say "goodbye", "bye", "exit", or "quit"
View Transcripts: See real-time transcripts in the transcript area
Audio Visualization: Watch the visual feedback for both your voice and the agent's voice

Configuration Reference

Environment Variables

Variable	Required	Description	Example
`AZURE_OPENAI_ENDPOINT`	Yes	Azure OpenAI endpoint URL	`https://your-resource.openai.azure.com/`
`AZURE_OPENAI_API_KEY`	Yes	Azure OpenAI API key	`your-api-key`
`AZURE_OPENAI_DEPLOYMENT_NAME`	Yes	Deployment name	`gpt-realtime`
`AZURE_OPENAI_API_VERSION`	Yes	API version	`2025-08-28`
`MCP_SERVER_URL`	No	MCP server URL (Azure native support)	`https://learn.microsoft.com/api/mcp`
`MCP_SERVER_LABEL`	No	Label for the MCP server	`microsoft-learn`
`MCP_REQUIRE_APPROVAL`	No	Tool approval mode	`never` (options: never, always, auto)
`LOG_LEVEL`	No	Logging level	`INFO` (options: DEBUG, INFO, WARNING, ERROR)
`LOG_FILE`	No	Log file path	`logs/conversation.log`
`WEB_HOST`	No	Web server host	`0.0.0.0`
`WEB_PORT`	No	Web server port	`8000`

Default Values

LOG_LEVEL: INFO
LOG_FILE: logs/conversation.log
WEB_HOST: 0.0.0.0
WEB_PORT: 8000

Troubleshooting

Common Issues

Audio Permission Denied

Problem: Browser blocks microphone access Solution:

Check browser settings for microphone permissions
Ensure you're using HTTPS or localhost (required for getUserMedia)
Try a different browser (Chrome, Firefox, Edge recommended)

WebSocket Connection Failed

Problem: Cannot connect to server Solution:

Verify server is running on correct port
Check firewall settings
Ensure WEB_HOST and WEB_PORT are correct
Check browser console for errors

Azure API Errors

Problem: 401 Unauthorized or 404 Not Found Solution:

Verify AZURE_OPENAI_ENDPOINT is correct
Check AZURE_OPENAI_API_KEY is valid
Ensure AZURE_OPENAI_DEPLOYMENT_NAME matches your deployment
Verify gpt-realtime model is deployed in your resource

MCP Tools Not Working

Problem: MCP tools not being called or returning errors Solution:

Verify MCP_SERVER_URL is correct (default: https://learn.microsoft.com/api/mcp)
Check network connectivity to the MCP server
Verify Azure Realtime API version supports MCP (2025-08-28 or later recommended)
Check logs for MCP-related errors

No Audio Visualization

Problem: Audio visualizations not showing Solution:

Check browser console for JavaScript errors
Ensure microphone permissions are granted
Try refreshing the page
Check browser compatibility (Chrome, Firefox, Edge recommended)

Browser Compatibility

Chrome/Edge: Full support (recommended)
Firefox: Full support
Safari: Limited support (may have audio issues)
Mobile browsers: Limited support

Development

Running in Development Mode

uvicorn app:app --host 0.0.0.0 --port 8000 --reload

Enabling Debug Logging

Set in .env:

LOG_LEVEL=DEBUG

Project Structure

realtime-speech/
├── app.py                 # FastAPI application entry point
├── config.py              # Configuration management
├── websocket_handler.py    # WebSocket connection management
├── realtime_client.py      # Azure Realtime API client
├── mcp_client.py           # MCP protocol client
├── tool_handler.py         # Tool registration and execution
├── conversation_manager.py # Conversation state management
├── logger.py               # Logging configuration
├── verify_env.py           # Environment verification
├── static/                 # Static files
│   ├── index.html          # Web interface
│   ├── app.js              # Frontend JavaScript
│   └── styles.css          # CSS styles
├── logs/                   # Log files
├── requirements.txt        # Python dependencies
├── .env.example            # Example configuration
└── README.md               # This file

Examples and Use Cases

Example Conversations

User: "What are the Azure CLI commands to create an Azure Container App?"

Agent: [Calls microsoft_docs_search tool] "According to the official Microsoft documentation, you can create an Azure Container App using..."

User: "Show me how to implement IHttpClientFactory in .NET 8"

Agent: [Calls microsoft_code_sample_search tool] "Here's an example from the official Microsoft documentation..."

MCP Tool Integration

The agent automatically uses Microsoft Learn MCP tools when relevant:

microsoft_docs_search: Semantic search against Microsoft official documentation
microsoft_docs_fetch: Fetch full content from a specific documentation page
microsoft_code_sample_search: Search for official code samples and examples

These tools are executed automatically by Azure Realtime API - no manual tool handling required.

Additional Resources

License

[Add your license here]

Support

For issues and questions:

Check the troubleshooting section above
Review the logs in logs/conversation.log
Run python verify_env.py to verify configuration
Open an issue in the repository

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
static		static
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
conversation_manager.py		conversation_manager.py
core		core
logger.py		logger.py
mcp_client.py		mcp_client.py
realtime_client.py		realtime_client.py
requirements.txt		requirements.txt
tool_handler.py		tool_handler.py
verify_env.py		verify_env.py
websocket_handler.py		websocket_handler.py

foundationallm/realtime-speech-prototype

Folders and files

Latest commit

History

Repository files navigation