Voice agent (OpenAI Realtime API) that orchestrates coding agents (Claude Code) and browser agents (Gemini Computer Use)
See this codebase in action here
A unified voice-controlled orchestrator that coordinates three types of AI agents:
- OpenAI Realtime Voice Agent - Natural voice interactions and orchestration
- Claude Code Agentic Coder - Software development and file operations
- Gemini Browser Agent - Web automation and validation
- Python 3.11+
- Astral uv - Fast Python package installer and runner
- API Keys: OpenAI, Anthropic (Claude), Google (Gemini)
- Playwright: For browser automation (
playwright installafter setup)
Install uv if you don't have it:
curl -LsSf https://astral.sh/uv/install.sh | shcd apps/realtime-pocCopy .env.sample to .env and fill in required values:
# Required API Keys
OPENAI_API_KEY=sk-... # For voice orchestration
ANTHROPIC_API_KEY=sk-ant-... # For Claude Code agents
GEMINI_API_KEY=... # For browser automation
# Optional API Keys (for future extensibility)
GROQ_API_KEY=
DEEPSEEK_API_KEY=
ELEVENLABS_API_KEY= # For advanced TTS
# Configuration
ENGINEER_NAME=Dan # Your name (for agent interactions)
AGENT_WORKING_DIRECTORY= # Leave empty to use default (apps/content-gen)playwright install# Voice mode (recommended for full experience)
uv run big_three_realtime_agents.py --voice
# Text mode (for testing)
uv run big_three_realtime_agents.py --input text --output text
# Auto-dispatch with prompt
uv run big_three_realtime_agents.py --prompt "Create a new claude code agent, and have it list all the files in the working directory"
# Use mini model (faster/cheaper)
uv run big_three_realtime_agents.py --mini --voicegraph TD
User[User Voice/Text Input] --> OAI[OpenAI Realtime Voice Agent]
OAI -->|Create Agent| CREATE[create_agent Tool]
OAI -->|Send Instructions| CMD[command_agent Tool]
OAI -->|Query Status| LIST[list_agents Tool]
OAI -->|Direct Browser| BU[browser_use Tool]
OAI -->|File Access| RF[read_file/open_file]
CREATE -->|Claude Code| CC[Claude Code Agent]
CREATE -->|Gemini| GB[Gemini Browser Agent]
CMD -->|Instructions| CC
CMD -->|Instructions| GB
CC -->|Write Code| WD[Working Directory: apps/content-gen]
CC -->|Store Sessions| REG1[Registry: agents/claude_code/]
GB -->|Browse/Validate| BR[Playwright Browser]
GB -->|Store Sessions| REG2[Registry: agents/gemini/]
BU -->|Direct Control| BR
BR -->|Screenshots| SS[output_logs/screenshots/]
WD -->|Agents Work Here| FS[Backend/Frontend Code]
OAI -->|Response| User
style OAI fill:#ff9,stroke:#333
style CC fill:#9cf,stroke:#333
style GB fill:#c9f,stroke:#333
style WD fill:#9f9,stroke:#333
big-3-super-agent/
├── .env.sample # Environment template
├── apps/
│ ├── content-gen/ # Agent working directory (default - you can change this to any directory you want)
│ │ ├── agents/ # Agent session registries
│ │ │ ├── claude_code/ # Claude Code agent sessions
│ │ │ └── gemini/ # Gemini agent sessions
│ │ ├── backend/ # Backend code (agents work here)
│ │ ├── frontend/ # Frontend code (agents work here)
│ │ ├── specs/ # Project specifications
│ │ └── logs/ # Agent execution logs
│ └── realtime-poc/ # Main orchestrator
│ ├── big_three_realtime_agents.py # Main entry point
│ ├── prompts/ # System prompts for agents
│ │ └── super_agent/ # Orchestrator prompts
│ └── output_logs/ # Voice agent logs & screenshots
-
big_three_realtime_agents.py: Main orchestrator script (3000+ lines)- Line 184-616:
GeminiBrowserAgentclass - Line 617-1540:
ClaudeCodeAgenticCoderclass - Line 1541-2900:
OpenAIRealtimeVoiceAgentclass
- Line 184-616:
-
Working Directory:
apps/content-gen/(configurable viaAGENT_WORKING_DIRECTORY)- All Claude Code agents operate with this as their
cwd - Agents create/modify files relative to this directory
- Registries stored in
agents/subdirectory
- All Claude Code agents operate with this as their
The OpenAI Realtime Voice Agent acts as the main orchestrator:
- Listens to user voice/text input
- Decides which agent type to use
- Dispatches tasks via tool calls
- Manages agent lifecycle (create, resume, list, delete)
Agents are pointed to a specific working directory:
AGENT_WORKING_DIRECTORY = Path(__file__).parent.parent / "content-gen"- Claude Code agents: Work in this directory with full file access
- Gemini agents: Store browser session data here
- Registries: Each agent type has a registry file tracking active sessions
The orchestrator exposes these tools to the voice agent:
list_agents()- Query all active agents and their statuscreate_agent(tool, type, agent_name)- Create a new agent (Claude Code or Gemini)command_agent(agent_name, prompt)- Send instructions to an existing agentdelete_agent(agent_name)- Remove an agent sessioncheck_agent_result(agent_name, operator_file_name)- Check agent execution resultsbrowser_use(task, url)- Direct browser automation taskopen_file(file_path)- Open a file in the default applicationread_file(file_path)- Read file contentsreport_costs()- Get API usage and cost information
- Each agent gets a unique session ID
- Sessions stored in registry JSON files
- Sessions can be resumed across voice interactions
- Operator files created for each coding task
This project includes built-in observability for tracking all agent activities in real-time. The system uses Claude Code Hooks Multi-Agent Observability to stream events from all agents to a centralized dashboard.
Zero setup required - just turn it on and agents automatically ping out events:
- Claude Code Hooks (
.claude/settings.json): Automatically triggered on tool use, notifications, session stops, etc. - Send Event Hook (
.claude/hooks/send_event.py): Forwards hook events to the observability server with AI-generated summaries - OpenAI Agent Integration: The
_send_observability_eventtool inbig_three_realtime_agents.pysends custom events from the voice orchestrator
- Real-time event stream: Every tool call, agent creation, file operation, and browser action
- AI-generated summaries: Automatic context-aware descriptions of what's happening
- Session tracking: Follow multiple agent sessions across the entire lifecycle
- Cost monitoring: Track API usage via the
report_costs()tool - Chat transcripts: Full conversation history included on session stops
-
Clone and run the observability server:
git clone https://github.com/disler/claude-code-hooks-multi-agent-observability cd claude-code-hooks-multi-agent-observability npm install && npm run dev
-
Start the agents (observability is already configured):
uv run big_three_realtime_agents.py --voice
-
Open the dashboard at
http://localhost:3000
Events automatically flow from:
- Claude Code agent tool calls (PreToolUse, PostToolUse)
- Voice orchestrator decisions
- Gemini browser actions
- Session lifecycle events
No configuration needed - it just works!
This project is powered by cutting-edge AI technologies:
- Gemini 2.5 Computer Use - Browser automation with vision and action planning
- OpenAI Realtime API - Natural voice interactions and orchestration
- OpenAI Sora API - Video generation capabilities
- Claude Code - Agentic software development
- Astral uv - Fast Python package management and script execution
- Tactical Agentic Coding - Agentic coding patterns and best practices
- Break up large single agent:
apps/realtime-poc/big_three_realtime_agents.pyis over 3000 lines of code, and although it's organized, it's still difficult to understand and maintain. - Error handling: Add better recovery from API failures
- Session persistence: Store voice conversation history
- Agent isolation: Sandbox agent file operations more strictly
- Cost tracking: Monitor API usage per agent/session
- Better logging: Structured logging with trace IDs
- Testing: Add unit tests for each agent class
- Configuration: Move hardcoded values to config file
- Agent interruption: Enable agents to be interrupted and redirected by the voice agent
- UI dashboard: Web interface to monitor agent activities
- Multi-modal input: Support image/video inputs for richer context
- Agent templates: Pre-configured agent profiles for common tasks
- Tool extensions: Plugin system for custom agent tools
- Collaborative coding: Multiple engineers working with same agents
- Browser recording: Record agent actions for debugging
- Voice customization: Train custom voice models
- Mobile support: iOS/Android companion apps
- Cloud deployment: Containerized deployment with Redis for state
- Agent marketplace: Share and discover agent configurations
"OPENAI_API_KEY not set": Ensure .env file exists and contains valid API key
"playwright not found": Run playwright install after initial setup
Agents not working in correct directory: Check AGENT_WORKING_DIRECTORY in .env
Voice not working: Ensure microphone permissions granted and pyaudio installed
And prepare for the future of software engineering
Learn tactical agentic coding patterns with Tactical Agentic Coding
Follow the IndyDevDan YouTube channel to improve your agentic coding advantage.
