Skip to content

Multi-model router with intelligent routing, cost tracking, quality monitoring, and fallback chains

License

Notifications You must be signed in to change notification settings

jstilb/compound-ai-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Compound AI System

Tests Python 3.11+ License: MIT

Multi-model router with intelligent routing, cost tracking, quality monitoring, and fallback chains. Route every request to the optimal model based on complexity, cost, and latency requirements.

Why I Built This

Production AI systems rarely use a single model. The compound AI pattern -- routing requests to different models based on complexity, cost, and latency -- is how companies actually deploy AI at scale. A simple greeting should not consume the same resources as a complex analysis. This system implements the full routing stack: a classifier that estimates query complexity, a router that selects the optimal model, fallback chains for reliability, cost tracking for budgeting, and quality monitoring to detect degradation.

The key insight: you can serve 80% of requests with a model that costs 10x less, without measurable quality loss.

Architecture

User Query
    |
    v
[Complexity Classifier] --> simple | moderate | complex | expert
    |
    v
[Routing Engine] --> strategy (quality | cost | balanced | latency)
    |
    v
[Fallback Chain] --> primary -> secondary -> tertiary
    |
    v
[Provider] --> mock-haiku | mock-sonnet | mock-opus | mock-gpt4o | ...
    |
    v
[Cost Tracker] + [Quality Monitor] + [Request Tracer]

Features

Feature Description
Query classification Heuristic complexity estimation (simple/moderate/complex/expert)
4 routing strategies Quality, cost, balanced, latency optimization
5 mock providers Simulating Haiku, Sonnet, Opus, GPT-4o, GPT-4o-mini
Fallback chains Automatic retry with next provider on failure
Circuit breaker Skip unhealthy providers (3 failures, 60s recovery)
Cost tracking Per-request cost, daily budgets, provider breakdown
Quality monitoring Coherence, completeness, conciseness scoring
Request tracing Full audit trail for every routing decision
FastAPI API /route, /health, /stats, /traces endpoints
CLI interface Route, classify, providers, demo commands

Quick Start

# Install
pip install -e ".[dev]"

# Run the demo
python -m src.cli demo

# Classify a query
python -m src.cli classify "Compare transformers and RNNs"

# List available providers
python -m src.cli providers

# Start API server
make serve

Routing Strategies

Strategy Behavior Best For
quality Route to tier matching complexity When accuracy matters most
cost Downgrade tiers to minimize cost High-volume, budget-constrained
balanced Weighted cost + latency optimization Default production use
latency Route to fastest available model Real-time applications

Cost Comparison

For a mixed workload (25% simple, 25% moderate, 25% complex, 25% expert):

Strategy Est. Cost per 1M Requests
Quality ~$7,500
Balanced ~$4,200
Cost ~$1,800
Latency ~$1,500

API Endpoints

Method Path Description
GET /health System health and provider status
POST /route Route a query to optimal provider
GET /stats Aggregate system statistics
GET /traces Recent request traces

Route a Query

curl -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"query": "What is machine learning?", "strategy": "balanced"}'

Design Decisions

See docs/architecture.md for detailed Mermaid diagrams.

Development

pip install -e ".[dev]"
make test          # Run tests with coverage
make lint          # Lint with ruff
make demo          # Run interactive demo
make serve         # Start API server

License

MIT

Related Projects

About

Multi-model router with intelligent routing, cost tracking, quality monitoring, and fallback chains

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published