ADK Stream Protocol

AI SDK v6 and Google ADK integration demonstrating SSE and WebSocket streaming implementation.

⚠️ Development Status

This project is under active development and contains experimental features with known issues.

Current Status

✅ Stable Features

Gemini Direct mode (AI SDK v6 only)
ADK SSE streaming with tool calling
Complete E2E test infrastructure (Frontend, Backend, Playwright)

🚧 Experimental Features

ADK BIDI (WebSocket) streaming - See known issues below

Known Issues

Critical: ADK BIDI Mode Limitations

BIDI mode (run_live()) has two significant issues:

Tool Confirmation Not Working 🔴
- Tools with require_confirmation=True do not trigger approval UI
- Root cause: ADK FunctionTool._call_live() TODO - "tool confirmation not yet supported for live mode"
- Status: Known ADK limitation, awaiting upstream fix
- Workaround: Use SSE mode for tools requiring confirmation
Missing Text Responses After Tool Execution 🟡
- Tools execute successfully but AI generates no explanatory text
- Only raw JSON output shown to user
- Status: Under investigation
- Workaround: Use SSE mode for full tool support

Recent Fixes

✅ Fixed infinite loop in tool confirmation auto-send logic (2025-12-17)

🎯 Project Overview

This project demonstrates the integration between:

Frontend: Next.js 16 with AI SDK v6 beta
Backend: Google ADK with FastAPI

Three Streaming Modes

Gemini Direct - Direct Gemini API via AI SDK (stable)
ADK SSE - ADK backend with Server-Sent Events (stable)
ADK BIDI ⚡ - ADK backend with WebSocket bidirectional streaming (experimental)

Key Insight: All three modes use the same AI SDK v6 Data Stream Protocol format, ensuring consistent frontend behavior regardless of backend implementation.

✨ Key Features

Streaming Modes

Gemini Direct: Built-in AI SDK v6 streaming support
ADK SSE: Token-by-token streaming via Server-Sent Events
ADK BIDI: Bidirectional WebSocket streaming for voice agents

Multimodal Capabilities

Text I/O: Token-by-token streaming with AI SDK v6
Image Input/Output: PNG, JPEG, WebP via data-image custom events
Audio Input: Microphone recording (16kHz PCM) with CMD key push-to-talk
Audio Output: PCM streaming (24kHz) with WAV playback
Audio Transcription: Input and output speech-to-text with native-audio models
Tool Calling: ADK integration with user approval flow (SSE mode)

Architecture Highlights

StreamProtocolConverter: Converts ADK events to AI SDK v6 Data Stream Protocol
SSE format over WebSocket: Backend sends SSE format via WebSocket for BIDI mode
Frontend Transparency: Same useChat hook works across all three modes
Custom Transport: WebSocketChatTransport for AI SDK v6 WebSocket support
Tool Approval Flow: Frontend-delegated execution with AI SDK v6 approval APIs

🛠️ Tech Stack

Frontend:

Next.js 16 (App Router)
React 19
AI SDK v6 beta (ai, @ai-sdk/react, @ai-sdk/google)
TypeScript 5.7

Backend:

Python 3.13
Google ADK >=1.20.0
FastAPI >=0.115.0
Pydantic v2

Development Tools:

pnpm (Node.js packages)
uv (Python packages)
just (task automation)

🚀 Quick Start

Prerequisites

Python 3.13+
Node.js 18+
pnpm, uv, just

Installation

# Install all dependencies
just install

# Or manually:
uv sync
pnpm install

Environment Setup

Copy the example file:

cp .env.example .env.local

Edit .env.local:

For Gemini Direct:

GOOGLE_GENERATIVE_AI_API_KEY=your_api_key_here
BACKEND_MODE=gemini
NEXT_PUBLIC_BACKEND_MODE=gemini

For ADK SSE/BIDI:

GOOGLE_API_KEY=your_api_key_here
BACKEND_MODE=adk-sse
NEXT_PUBLIC_BACKEND_MODE=adk-sse
ADK_BACKEND_URL=http://localhost:8000
NEXT_PUBLIC_ADK_BACKEND_URL=http://localhost:8000

Running

Gemini Direct (frontend only):

pnpm dev

ADK SSE/BIDI (backend + frontend):

# Run both concurrently:
just dev

# Or separately:
just server  # Backend on :8000
pnpm dev     # Frontend on :3000

For all available commands:

just --list

🧪 Testing

Python Backend Tests:

just test-python
# Expected: ~200 passed (unit + integration + e2e)

TypeScript Frontend Tests:

pnpm test:lib
# Expected: ~565 passed (unit + integration + e2e)

Playwright E2E Tests:

just test-e2e-clean  # Recommended: clean server restart
just test-e2e-ui     # Interactive UI mode

Code Quality:

just format  # Format code
just lint    # Run linters
just check   # Run type checks

📚 Documentation

Complete documentation is available in the docs/ directory:

Quick Start

Getting Started Guide - Detailed setup, usage, troubleshooting, AI SDK v6 migration
Glossary - Key terms, concepts, and patterns

Architecture & Specs

Architecture Overview - Complete system architecture
- AudioWorklet PCM Streaming
- Tool Approval Flow (Frontend Delegation Pattern)
- Per-Connection State Management
- Multimodal Support Architecture
Protocol Implementation - ADK ↔ AI SDK v6 protocol
- Event/Part field mapping
- Implementation status
- Custom extensions (data-pcm, data-image, etc.)

Backend (Python)

Result Type Pattern - Ok(value) / Error(value) error handling

Frontend (TypeScript)

Library Structure - lib/ organization and module dependencies
React Optimization - Memoization and performance patterns
Vitest Tests - lib/tests/ structure (unit, integration, e2e)

Testing

Testing Strategy - Overall test architecture (pytest, Vitest, Playwright)
E2E Testing Guide - Complete E2E testing documentation
- Backend E2E (pytest golden files)
- Frontend E2E (Vitest browser tests)
- Fixtures management
- Chunk Logger debugging
Coverage Audit - Test coverage verification

Architecture Decision Records

ADR-0001 - Per-Connection State Management
ADR-0002 - Tool Approval Architecture
ADR-0003 - SSE vs BIDI Confirmation Protocol
ADR-0004 - Multi-Tool Response Timing
ADR-0005 - Frontend Execute Pattern and [DONE] Timing
ADR-0006 - sendAutomaticallyWhen Decision Logic Order
ADR-0007 - Approval Value Independence
ADR-0008 - SSE Mode Pattern A Only
ADR-0009 - Phase 12 Blocking Mode
ADR-0010 - BIDI Confirmation Chunk Generation

Additional Resources

Experiments - Research notes, protocol investigations, multimodal experiments

🔬 Experiments & Research

All experiment notes and architectural investigations are documented in experiments/:

Bidirectional protocol investigations
Multimodal support (images, audio, video)
Tool approval flow implementations
Test coverage investigations
ADK field mapping completeness

See experiments/README.md for the complete experiment index and results.

📄 License

MIT License. See LICENSE file for details.

🔗 References

Last Updated: 2025-12-29

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.claude		.claude
.semgrep		.semgrep
adk_stream_protocol		adk_stream_protocol
agents		agents
app		app
assets		assets
components		components
docs		docs
experiments		experiments
fixtures		fixtures
lib		lib
public		public
scenarios		scenarios
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.python-version		.python-version
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
justfile		justfile
next-env.d.ts		next-env.d.ts
next.config.ts		next.config.ts
package.json		package.json
playwright.config.ts		playwright.config.ts
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
pyproject.toml		pyproject.toml
server.py		server.py
tsconfig.json		tsconfig.json
uv.lock		uv.lock
vitest.config.ts		vitest.config.ts
vitest.setup.ts		vitest.setup.ts

License

hironow/adk-stream-protocol

Folders and files

Latest commit

History

Repository files navigation

ADK Stream Protocol

⚠️ Development Status

Current Status

Known Issues

🎯 Project Overview

Three Streaming Modes

✨ Key Features

Streaming Modes

Multimodal Capabilities

Architecture Highlights

🛠️ Tech Stack

🚀 Quick Start

Prerequisites

Installation

Environment Setup

Running

🧪 Testing

📚 Documentation

Quick Start

Architecture & Specs

Backend (Python)

Frontend (TypeScript)

Testing

Architecture Decision Records

Additional Resources

🔬 Experiments & Research

📄 License

🔗 References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages