One API endpoint. Any backend. Zero configuration.
Nexus is a distributed LLM orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway. Local first, cloud when needed.
- 🔍 Auto-Discovery — Finds LLM backends on your network via mDNS
- 🎯 Intelligent Routing — Routes by model capabilities, load, and latency
- 🔄 Transparent Failover — Retries with fallback backends automatically
- 🔌 OpenAI-Compatible — Works with any OpenAI API client
- ⚡ Zero Config — Just run it — works out of the box with Ollama
- 🔒 Privacy Zones — Structural enforcement prevents data from reaching cloud backends
- 💰 Budget Management — Token-aware cost tracking with automatic spend limits
- 📊 Real-time Dashboard — Monitor backends, models, and requests in your browser
- 🧠 Quality Tracking — Profiles backend response quality to inform routing decisions
- 📐 Embeddings API — OpenAI-compatible
/v1/embeddingswith capability-aware routing - 📋 Request Queuing — Holds requests when backends are busy, with priority support
- 🔧 Model Lifecycle — Load, unload, and migrate models across backends via API
- 🔮 Fleet Intelligence — Pattern analysis with pre-warming recommendations
| Backend | Status | Discovery |
|---|---|---|
| Ollama | ✅ Supported | mDNS (auto) |
| LM Studio | ✅ Supported | Static config |
| vLLM | ✅ Supported | Static config |
| llama.cpp | ✅ Supported | Static config |
| exo | ✅ Supported | mDNS (auto) |
| OpenAI | ✅ Supported | Static config |
# Install from source
cargo install --path .
# Start with auto-discovery (zero config)
nexus serve
# Or with Docker
docker run -d -p 8000:8000 leocamello/nexusOnce running, send your first request:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3:70b", "messages": [{"role": "user", "content": "Hello!"}]}'Point any OpenAI-compatible client to http://localhost:8000/v1 — Claude Code, Continue.dev, OpenAI SDK, or plain curl.
→ Full setup guide — installation, configuration, CLI reference, and more.
┌──────────────────────────────────────────────────┐
│ Nexus Orchestrator │
│ - Discovers backends via mDNS │
│ - Tracks model capabilities & quality │
│ - Routes to best available backend │
│ - Queues requests when backends are busy │
│ - OpenAI-compatible API + Embeddings │
└──────────────────────────────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ Ollama │ │ vLLM │ │ exo │ │ OpenAI │
│ 7B │ │ 70B │ │ 32B │ │ cloud │
└────────┘ └────────┘ └────────┘ └────────┘
| Document | What you'll find | |
|---|---|---|
| 🚀 | Getting Started | Installation, configuration, CLI, environment variables |
| 📖 | REST API | HTTP endpoints, X-Nexus-* headers, error responses |
| 🔌 | WebSocket API | Real-time dashboard protocol |
| 🏗️ | Architecture | System design, module structure, data flows |
| 🗺️ | Roadmap | Feature index (F01–F23), version history, future plans |
| 🔧 | Troubleshooting | Common errors, debugging tips |
| ❓ | FAQ | What Nexus is (and isn't), common questions |
| 🤝 | Contributing | Dev workflow, coding standards, PR guidelines |
| 📋 | Changelog | Release history |
| 🔒 | Security | Vulnerability reporting |
Apache License 2.0 — see LICENSE for details.