Two main strengths:
- Lightweight LiteLLM-compatible router - Drop in your existing LiteLLM config, minimal code
- Open-WebUI with PostgreSQL - Complete setup with database initialization and PM2 management
Self-hosted LLM routing that's easy to run and maintain.
✅ Good Fit:
- Need LiteLLM-compatible routing without the full LiteLLM stack
- Want Open-WebUI with PostgreSQL pre-configured
- Simple setups with minimal maintenance overhead
- On-premises deployments with strict data control requirements
- Quick deployment with existing LiteLLM configs
❌ Consider Alternatives:
- Advanced Features: Use LiteLLM directly (100+ providers, load balancing, caching, observability)
- Zero Setup: Use OpenRouter.ai (400+ models, automatic failover, no hosting)
- Single Provider: Use Open-WebUI's native configuration
Router:
- LiteLLM Config Compatible: Use existing LiteLLM
config.ymlfiles directly - Minimal Code: Lightweight implementation, easy to understand and maintain
- Auto Format Conversion: OpenAI ↔ Claude/Gemini message formats
- Streaming Support: Unified OpenAI-compatible SSE streaming
- Hot Reload: Update config without restarting services
Infrastructure:
- PostgreSQL Setup: Pre-configured database initialization for Open-WebUI
- PM2 Management: Simple service orchestration
Works with any LiteLLM-supported provider. Common examples:
- OpenAI:
gpt-4o,gpt-4.1,o3-mini - Anthropic:
claude-opus-4,claude-sonnet-4 - Google:
gemini-2.0-flash,gemini-1.5-pro - xAI:
grok-2,grok-beta - Ollama: Local models (auto-discovered via
scan-models) - LM Studio: Local models (auto-discovered via
scan-models) - Many more: See LiteLLM providers
# Setup environment
cp .env.example .env
# Edit .env with your database credentials
# Configure router
cp conf/config.example.yml conf/config.yml
# Edit conf/config.yml with your API keys and models
# Initialize database and start services
./manage.sh init -x
./manage.sh startRequirements: Python 3.11+, PostgreSQL, PM2
1. Environment Variables (.env):
# Database & Ports
DATABASE_URL=postgresql://user:pass@localhost:5432/openwebui_db
LLM_ROUTER_PORT=8086
OPENWEBUI_PORT=80872. LiteLLM Config (conf/config.yml):
model_list:
- model_name: gpt-4.1
litellm_params:
model: gpt-4.1
api_key: your_openai_key
- model_name: claude-sonnet-4
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: your_claude_keyThat's it! Standard LiteLLM format - the router handles the rest automatically.
See LITELLM_COMPATIBILITY.md for advanced features and migration notes.
Automatically discover and configure local models from Ollama and LM Studio:
# Scan available models from Ollama and LM Studio
./manage.sh scan-models
# Scan and automatically update conf/config.yml
./manage.sh scan-models ollama -u # Ollama only
./manage.sh scan-models lmstudio -u # LM Studio only
./manage.sh scan-models all -u # Both servicesPrerequisites:
- Ollama: Must be running (
./manage.sh start ollamaorollama serve) - LM Studio: Must have the server started in the LM Studio app
Environment Variables:
OLLAMA_HOST=http://localhost:11434 # Default Ollama endpoint
LMSTUDIO_HOST=http://localhost:1234 # Default LM Studio endpointThe scanner will:
- Query the respective service APIs for available models
- Generate LiteLLM-compatible configurations
- Update
conf/config.ymlwith the discovered models - Create a backup of your config before updating
- Configure Open-WebUI connection:
- Base URL:
http://localhost:8086/v1 - Use any configured API key
- Base URL:
- Claude models work transparently with automatic format conversion
- All models appear as OpenAI-compatible in the interface
POST /v1/chat/completions- OpenAI-compatible chat (includes Claude with conversion)GET /v1/models- List available modelsPOST /admin/reload-backends- Reloadconf/config.ymlwithout restart
# Database
./manage.sh init -x # Initialize database
# Services
./manage.sh start # Start all services (PM2)
./manage.sh start llm-router # Start router only
./manage.sh start ollama # Start Ollama server
./manage.sh stop # Stop all services
./manage.sh status # Check status
# Model Discovery
./manage.sh scan-models # Scan all available models (Ollama + LM Studio)
./manage.sh scan-models ollama # Scan Ollama models only
./manage.sh scan-models lmstudio # Scan LM Studio models only
./manage.sh scan-models ollama -u # Scan and update conf/config.yml
# PM2 Management
pm2 logs # View logs
pm2 restart all # Restart services# Install and run tests
pip install pytest pytest-asyncio
pytest tests/test_llm_router.py -v
# Test the router
curl -X POST http://localhost:8086/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{"model": "claude-sonnet-4", "messages": [{"role": "user", "content": "Hello!"}]}'Minimal codebase - easy to understand and maintain:
src/open_llm_router/
├── llm_router.py # Main FastAPI app (~300 lines)
├── providers/ # Provider implementations
│ ├── claude.py # Anthropic format conversion
│ ├── gemini.py # Google Gemini integration
│ └── openai.py # OpenAI/compatible providers
└── utils/
├── model_router.py # LiteLLM config → backend routing
└── logger.py # Request logging
Design Philosophy:
- LiteLLM config compatibility without the full stack
- Automatic format conversion (OpenAI ↔ provider-specific)
- Simple streaming aggregation
- Hot-reloadable configuration
This is a lightweight alternative, not a full LiteLLM replacement:
- No Advanced Features: Missing caching, load balancing, fallbacks, observability
- Basic Routing: Single provider per model, no intelligent selection
- Self-Hosted: You manage infrastructure and updates
- Limited Scale: Best for small teams, not enterprise deployments
For production needs, consider LiteLLM or OpenRouter.ai.
This project is licensed under the MIT License - see the LICENSE file for details.