A web application that enables real-time voice conversations with an AI agent using Azure's Realtime API (Microsoft Foundry). The agent can use MCP tools to access Microsoft Learn documentation, supports interruptions, and maintains natural conversation flow.
- Web-based Interface: Accessible via web browser, no installation required
- Real-time Voice: Bidirectional audio streaming via WebSocket
- Interruptions: User can interrupt agent mid-speech
- MCP Tool Integration: Agent can access Microsoft Learn documentation via native Azure Realtime API MCP support
- Natural Flow: Conversation continues until explicit goodbye
- Comprehensive Logging: All interactions logged with timestamps
- Browser Audio: Uses Web Audio API for microphone capture and audio playback
- Audio Visualization: Real-time visual feedback for both user and agent speech
- Error Handling: Graceful error handling and recovery
- Session Management: Support for multiple concurrent sessions
- Python 3.9 or higher
- Azure OpenAI account with Realtime API access
- Azure OpenAI resource with
gpt-realtimemodel deployed
-
Clone the repository
git clone <repository-url> cd realtime-speech
-
Create virtual environment
python -m venv venv # Windows venv\Scripts\activate # Linux/Mac source venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment
copy .env.example .env # Edit .env with your credentials -
Verify environment
python verify_env.py
-
Start the server
python app.py
-
Access the web interface
- Open browser to
http://localhost:8000 - Grant microphone permissions when prompted
- Click "Start Conversation" button
- Open browser to
- Ensure Python 3.9+ is installed
- Verify with:
python --version
- Create an Azure OpenAI resource in the Azure portal
- Deploy the
gpt-realtimemodel in your resource - Get your endpoint URL and API key from the Azure portal
- Note your deployment name (typically
gpt-realtime)
The Microsoft Learn MCP server is pre-configured and requires no setup. It provides free access to official Microsoft documentation.
To use a different MCP server, update these settings in .env:
MCP_SERVER_URL: URL of your MCP serverMCP_SERVER_LABEL: Label for the MCP serverMCP_REQUIRE_APPROVAL: Approval mode (never,always, orauto)
The agent has access to the following tools via the Microsoft Learn MCP Server:
| Tool | Description |
|---|---|
microsoft_docs_search |
Semantic search against Microsoft official technical documentation |
microsoft_docs_fetch |
Fetch and convert a Microsoft documentation page into markdown |
microsoft_code_sample_search |
Search for official Microsoft/Azure code snippets and examples |
git clone <repository-url>
cd realtime-speechpython -m venv venvWindows:
venv\Scripts\activateLinux/Mac:
source venv/bin/activatepip install -r requirements.txt-
Copy the example environment file:
copy .env.example .env # Linux/Mac: cp .env.example .env -
Edit
.envand fill in your values:AZURE_OPENAI_ENDPOINT: Your Azure OpenAI endpoint URLAZURE_OPENAI_API_KEY: Your Azure OpenAI API keyAZURE_OPENAI_DEPLOYMENT_NAME: Your deployment name (usuallygpt-realtime)AZURE_OPENAI_API_VERSION: API version (default:2024-05-01-preview)NEWSAPI_API_KEY: Your NewsAPI key (optional)MCP_SERVER_PATH: Path to NewsAPI MCP server directory
python verify_env.pyThis will check:
- ✓
.envfile exists - ✓ All required environment variables are set
- ✓ MCP server path is accessible
- ✓ Optional: Test Azure API connection (with
--test-api) - ✓ Optional: Test MCP server (with
--test-mcp)
Option 1: Using the app.py script
python app.pyOption 2: Using uvicorn directly
uvicorn app:app --host 0.0.0.0 --port 8000- Open your browser to
http://localhost:8000 - You should see the conversation interface
- Click "Start Conversation" to begin
- Grant microphone permissions when prompted
- Start Conversation: Click the "Start Conversation" button
- Stop Conversation: Click "Stop Conversation" to end the session
- Interrupt Agent: Simply start speaking while the agent is talking
- End Conversation: Say "goodbye", "bye", "exit", or "quit"
- View Transcripts: See real-time transcripts in the transcript area
- Audio Visualization: Watch the visual feedback for both your voice and the agent's voice
| Variable | Required | Description | Example |
|---|---|---|---|
AZURE_OPENAI_ENDPOINT |
Yes | Azure OpenAI endpoint URL | https://your-resource.openai.azure.com/ |
AZURE_OPENAI_API_KEY |
Yes | Azure OpenAI API key | your-api-key |
AZURE_OPENAI_DEPLOYMENT_NAME |
Yes | Deployment name | gpt-realtime |
AZURE_OPENAI_API_VERSION |
Yes | API version | 2025-08-28 |
MCP_SERVER_URL |
No | MCP server URL (Azure native support) | https://learn.microsoft.com/api/mcp |
MCP_SERVER_LABEL |
No | Label for the MCP server | microsoft-learn |
MCP_REQUIRE_APPROVAL |
No | Tool approval mode | never (options: never, always, auto) |
LOG_LEVEL |
No | Logging level | INFO (options: DEBUG, INFO, WARNING, ERROR) |
LOG_FILE |
No | Log file path | logs/conversation.log |
WEB_HOST |
No | Web server host | 0.0.0.0 |
WEB_PORT |
No | Web server port | 8000 |
LOG_LEVEL:INFOLOG_FILE:logs/conversation.logWEB_HOST:0.0.0.0WEB_PORT:8000
Problem: Browser blocks microphone access Solution:
- Check browser settings for microphone permissions
- Ensure you're using HTTPS or localhost (required for getUserMedia)
- Try a different browser (Chrome, Firefox, Edge recommended)
Problem: Cannot connect to server Solution:
- Verify server is running on correct port
- Check firewall settings
- Ensure
WEB_HOSTandWEB_PORTare correct - Check browser console for errors
Problem: 401 Unauthorized or 404 Not Found Solution:
- Verify
AZURE_OPENAI_ENDPOINTis correct - Check
AZURE_OPENAI_API_KEYis valid - Ensure
AZURE_OPENAI_DEPLOYMENT_NAMEmatches your deployment - Verify
gpt-realtimemodel is deployed in your resource
Problem: MCP tools not being called or returning errors Solution:
- Verify
MCP_SERVER_URLis correct (default:https://learn.microsoft.com/api/mcp) - Check network connectivity to the MCP server
- Verify Azure Realtime API version supports MCP (
2025-08-28or later recommended) - Check logs for MCP-related errors
Problem: Audio visualizations not showing Solution:
- Check browser console for JavaScript errors
- Ensure microphone permissions are granted
- Try refreshing the page
- Check browser compatibility (Chrome, Firefox, Edge recommended)
- Chrome/Edge: Full support (recommended)
- Firefox: Full support
- Safari: Limited support (may have audio issues)
- Mobile browsers: Limited support
uvicorn app:app --host 0.0.0.0 --port 8000 --reloadSet in .env:
LOG_LEVEL=DEBUG
realtime-speech/
├── app.py # FastAPI application entry point
├── config.py # Configuration management
├── websocket_handler.py # WebSocket connection management
├── realtime_client.py # Azure Realtime API client
├── mcp_client.py # MCP protocol client
├── tool_handler.py # Tool registration and execution
├── conversation_manager.py # Conversation state management
├── logger.py # Logging configuration
├── verify_env.py # Environment verification
├── static/ # Static files
│ ├── index.html # Web interface
│ ├── app.js # Frontend JavaScript
│ └── styles.css # CSS styles
├── logs/ # Log files
├── requirements.txt # Python dependencies
├── .env.example # Example configuration
└── README.md # This file
User: "What are the Azure CLI commands to create an Azure Container App?"
Agent: [Calls microsoft_docs_search tool] "According to the official Microsoft documentation, you can create an Azure Container App using..."
User: "Show me how to implement IHttpClientFactory in .NET 8"
Agent: [Calls microsoft_code_sample_search tool] "Here's an example from the official Microsoft documentation..."
The agent automatically uses Microsoft Learn MCP tools when relevant:
microsoft_docs_search: Semantic search against Microsoft official documentationmicrosoft_docs_fetch: Fetch full content from a specific documentation pagemicrosoft_code_sample_search: Search for official code samples and examples
These tools are executed automatically by Azure Realtime API - no manual tool handling required.
- Azure OpenAI Realtime API Documentation
- Azure Realtime API MCP Support
- Microsoft Learn MCP Server
- Microsoft Learn MCP Server GitHub
- MCP Protocol Documentation
- Web Audio API Documentation
[Add your license here]
For issues and questions:
- Check the troubleshooting section above
- Review the logs in
logs/conversation.log - Run
python verify_env.pyto verify configuration - Open an issue in the repository