AI SDK v6 and Google ADK integration demonstrating SSE and WebSocket streaming implementation.
This project is under active development and contains experimental features with known issues.
✅ Stable Features
- Gemini Direct mode (AI SDK v6 only)
- ADK SSE streaming with tool calling
- Complete E2E test infrastructure (Frontend, Backend, Playwright)
🚧 Experimental Features
- ADK BIDI (WebSocket) streaming - See known issues below
Critical: ADK BIDI Mode Limitations
BIDI mode (run_live()) has two significant issues:
-
Tool Confirmation Not Working 🔴
- Tools with
require_confirmation=Truedo not trigger approval UI - Root cause: ADK
FunctionTool._call_live()TODO - "tool confirmation not yet supported for live mode" - Status: Known ADK limitation, awaiting upstream fix
- Workaround: Use SSE mode for tools requiring confirmation
- Tools with
-
Missing Text Responses After Tool Execution 🟡
- Tools execute successfully but AI generates no explanatory text
- Only raw JSON output shown to user
- Status: Under investigation
- Workaround: Use SSE mode for full tool support
Recent Fixes
- ✅ Fixed infinite loop in tool confirmation auto-send logic (2025-12-17)
This project demonstrates the integration between:
- Frontend: Next.js 16 with AI SDK v6 beta
- Backend: Google ADK with FastAPI
- Gemini Direct - Direct Gemini API via AI SDK (stable)
- ADK SSE - ADK backend with Server-Sent Events (stable)
- ADK BIDI ⚡ - ADK backend with WebSocket bidirectional streaming (experimental)
Key Insight: All three modes use the same AI SDK v6 Data Stream Protocol format, ensuring consistent frontend behavior regardless of backend implementation.
- Gemini Direct: Built-in AI SDK v6 streaming support
- ADK SSE: Token-by-token streaming via Server-Sent Events
- ADK BIDI: Bidirectional WebSocket streaming for voice agents
- Text I/O: Token-by-token streaming with AI SDK v6
- Image Input/Output: PNG, JPEG, WebP via
data-imagecustom events - Audio Input: Microphone recording (16kHz PCM) with CMD key push-to-talk
- Audio Output: PCM streaming (24kHz) with WAV playback
- Audio Transcription: Input and output speech-to-text with native-audio models
- Tool Calling: ADK integration with user approval flow (SSE mode)
- StreamProtocolConverter: Converts ADK events to AI SDK v6 Data Stream Protocol
- SSE format over WebSocket: Backend sends SSE format via WebSocket for BIDI mode
- Frontend Transparency: Same
useChathook works across all three modes - Custom Transport:
WebSocketChatTransportfor AI SDK v6 WebSocket support - Tool Approval Flow: Frontend-delegated execution with AI SDK v6 approval APIs
Frontend:
- Next.js 16 (App Router)
- React 19
- AI SDK v6 beta (
ai,@ai-sdk/react,@ai-sdk/google) - TypeScript 5.7
Backend:
- Python 3.13
- Google ADK >=1.20.0
- FastAPI >=0.115.0
- Pydantic v2
Development Tools:
- pnpm (Node.js packages)
- uv (Python packages)
- just (task automation)
- Python 3.13+
- Node.js 18+
- pnpm, uv, just
# Install all dependencies
just install
# Or manually:
uv sync
pnpm installCopy the example file:
cp .env.example .env.localEdit .env.local:
For Gemini Direct:
GOOGLE_GENERATIVE_AI_API_KEY=your_api_key_here
BACKEND_MODE=gemini
NEXT_PUBLIC_BACKEND_MODE=geminiFor ADK SSE/BIDI:
GOOGLE_API_KEY=your_api_key_here
BACKEND_MODE=adk-sse
NEXT_PUBLIC_BACKEND_MODE=adk-sse
ADK_BACKEND_URL=http://localhost:8000
NEXT_PUBLIC_ADK_BACKEND_URL=http://localhost:8000Gemini Direct (frontend only):
pnpm devADK SSE/BIDI (backend + frontend):
# Run both concurrently:
just dev
# Or separately:
just server # Backend on :8000
pnpm dev # Frontend on :3000For all available commands:
just --listPython Backend Tests:
just test-python
# Expected: ~200 passed (unit + integration + e2e)TypeScript Frontend Tests:
pnpm test:lib
# Expected: ~565 passed (unit + integration + e2e)Playwright E2E Tests:
just test-e2e-clean # Recommended: clean server restart
just test-e2e-ui # Interactive UI modeCode Quality:
just format # Format code
just lint # Run linters
just check # Run type checksComplete documentation is available in the docs/ directory:
- Getting Started Guide - Detailed setup, usage, troubleshooting, AI SDK v6 migration
- Glossary - Key terms, concepts, and patterns
-
Architecture Overview - Complete system architecture
- AudioWorklet PCM Streaming
- Tool Approval Flow (Frontend Delegation Pattern)
- Per-Connection State Management
- Multimodal Support Architecture
-
Protocol Implementation - ADK ↔ AI SDK v6 protocol
- Event/Part field mapping
- Implementation status
- Custom extensions (data-pcm, data-image, etc.)
- Result Type Pattern -
Ok(value)/Error(value)error handling
- Library Structure -
lib/organization and module dependencies - React Optimization - Memoization and performance patterns
- Vitest Tests -
lib/tests/structure (unit, integration, e2e)
- Testing Strategy - Overall test architecture (pytest, Vitest, Playwright)
- E2E Testing Guide - Complete E2E testing documentation
- Backend E2E (pytest golden files)
- Frontend E2E (Vitest browser tests)
- Fixtures management
- Chunk Logger debugging
- Coverage Audit - Test coverage verification
- ADR-0001 - Per-Connection State Management
- ADR-0002 - Tool Approval Architecture
- ADR-0003 - SSE vs BIDI Confirmation Protocol
- ADR-0004 - Multi-Tool Response Timing
- ADR-0005 - Frontend Execute Pattern and [DONE] Timing
- ADR-0006 - sendAutomaticallyWhen Decision Logic Order
- ADR-0007 - Approval Value Independence
- ADR-0008 - SSE Mode Pattern A Only
- ADR-0009 - Phase 12 Blocking Mode
- ADR-0010 - BIDI Confirmation Chunk Generation
- Experiments - Research notes, protocol investigations, multimodal experiments
All experiment notes and architectural investigations are documented in experiments/:
- Bidirectional protocol investigations
- Multimodal support (images, audio, video)
- Tool approval flow implementations
- Test coverage investigations
- ADK field mapping completeness
See experiments/README.md for the complete experiment index and results.
MIT License. See LICENSE file for details.
Last Updated: 2025-12-29