Skip to content

Distributed LLM model serving orchestrator - unified API gateway for heterogeneous inference backends

License

Notifications You must be signed in to change notification settings

leocamello/nexus

Nexus

Rust License GitHub Release Docker Crates.io docs.rs codecov CI

One API endpoint. Any backend. Zero configuration.

Nexus is a distributed LLM orchestrator that unifies heterogeneous inference backends behind a single, intelligent API gateway. Local first, cloud when needed.

Features

  • 🔍 Auto-Discovery — Finds LLM backends on your network via mDNS
  • 🎯 Intelligent Routing — Routes by model capabilities, load, and latency
  • 🔄 Transparent Failover — Retries with fallback backends automatically
  • 🔌 OpenAI-Compatible — Works with any OpenAI API client
  • Zero Config — Just run it — works out of the box with Ollama
  • 🔒 Privacy Zones — Structural enforcement prevents data from reaching cloud backends
  • 💰 Budget Management — Token-aware cost tracking with automatic spend limits
  • 📊 Real-time Dashboard — Monitor backends, models, and requests in your browser
  • 🧠 Quality Tracking — Profiles backend response quality to inform routing decisions
  • 📐 Embeddings API — OpenAI-compatible /v1/embeddings with capability-aware routing
  • 📋 Request Queuing — Holds requests when backends are busy, with priority support
  • 🔧 Model Lifecycle — Load, unload, and migrate models across backends via API
  • 🔮 Fleet Intelligence — Pattern analysis with pre-warming recommendations

Supported Backends

Backend Status Discovery
Ollama ✅ Supported mDNS (auto)
LM Studio ✅ Supported Static config
vLLM ✅ Supported Static config
llama.cpp ✅ Supported Static config
exo ✅ Supported mDNS (auto)
OpenAI ✅ Supported Static config

Quick Start

# Install from source
cargo install --path .

# Start with auto-discovery (zero config)
nexus serve

# Or with Docker
docker run -d -p 8000:8000 leocamello/nexus

Once running, send your first request:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3:70b", "messages": [{"role": "user", "content": "Hello!"}]}'

Point any OpenAI-compatible client to http://localhost:8000/v1 — Claude Code, Continue.dev, OpenAI SDK, or plain curl.

Full setup guide — installation, configuration, CLI reference, and more.

Architecture

┌──────────────────────────────────────────────────┐
│              Nexus Orchestrator                   │
│  - Discovers backends via mDNS                   │
│  - Tracks model capabilities & quality           │
│  - Routes to best available backend              │
│  - Queues requests when backends are busy        │
│  - OpenAI-compatible API + Embeddings            │
└──────────────────────────────────────────────────┘
        │           │           │           │
        ▼           ▼           ▼           ▼
   ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
   │ Ollama │  │  vLLM  │  │  exo   │  │ OpenAI │
   │  7B    │  │  70B   │  │  32B   │  │ cloud  │
   └────────┘  └────────┘  └────────┘  └────────┘

Documentation

Document What you'll find
🚀 Getting Started Installation, configuration, CLI, environment variables
📖 REST API HTTP endpoints, X-Nexus-* headers, error responses
🔌 WebSocket API Real-time dashboard protocol
🏗️ Architecture System design, module structure, data flows
🗺️ Roadmap Feature index (F01–F23), version history, future plans
🔧 Troubleshooting Common errors, debugging tips
FAQ What Nexus is (and isn't), common questions
🤝 Contributing Dev workflow, coding standards, PR guidelines
📋 Changelog Release history
🔒 Security Vulnerability reporting

License

Apache License 2.0 — see LICENSE for details.

Related Projects

  • exo — Distributed AI inference
  • LM Studio — Desktop app for local LLMs
  • Ollama — Easy local LLM serving
  • vLLM — High-throughput LLM serving
  • LiteLLM — Cloud LLM API router

About

Distributed LLM model serving orchestrator - unified API gateway for heterogeneous inference backends

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •