Important
Rook is under development. Interfaces may shift, including occasional breaking changes as the API settles.
Rook is a lightweight runtime that serves local AI models through a single HTTP/gRPC API. The focus is reliable local inference for the core capabilities most systems need: LLMs, STT, TTS, embeddings, vision, and NLU. Instead of supporting every possible backend, the goal is to keep the surface area small and consistent while the foundations feel solid.
It’s intended for offline assistants, on-device chat systems, embedded agents, CLI tools, and services that benefit from a unified abstraction instead of juggling separate runtimes.
docker run -d -p 8080:8080 -p 50051:50051 --name rook ghcr.io/ju4n97/rook:latest-cpuRequires NVIDIA Container Toolkit.
docker run -d -p 8080:8080 -p 50051:50051 --gpus all --name rook ghcr.io/ju4n97/rook:latest-cudacurl http://localhost:8080/healthTo use your own models or customize configuration, mount local directories:
docker run -d \
-p 8080:8080 \
-p 50051:50051 \
-v $(pwd)/rook.yaml:/home/rook/config/rook.yaml \
-v $(pwd)/models:/home/rook/models \
--name rook \
ghcr.io/ju4n97/rook:latestRook uses a rook.yaml file to define models to download and expose. Models are downloaded and cached automatically from the specified source on the first run.
# rook.yaml
version: "1"
# Models that rook will fetch and make available locally
models:
qwen2.5-1.5b-instruct-q4_k_m:
type: llm
backend: llama.cpp
source:
huggingface:
repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
include: ["qwen2.5-1.5b-instruct-q4_k_m.gguf"]
whisper-small:
type: stt
backend: whisper.cpp
source:
huggingface:
repo: ggerganov/whisper.cpp
include: ["ggml-small.bin"]
tags: [multilingual, streaming]
piper-es-ar-daniela:
type: tts
backend: piper
source:
huggingface:
repo: rhasspy/piper-voices
include: ["es/es_AR/daniela/high/*"]
tags: [spanish, argentina, high-quality]| Variable | Description |
|---|---|
ROK_SERVER_HTTP_HOST |
HTTP server host |
ROK_SERVER_GRPC_HOST |
gRPC server host |
ROOK_SERVER_HTTP_PORT |
HTTP server port |
ROOK_SERVER_GRPC_PORT |
gRPC server port |
ROOK_MODELS_PATH |
Path to models directory |
ROOK_CONFIG_PATH |
Path to config file (rook.yaml) |
Working demos can be found in the examples directory.
- http-completion: Basic chat completion via cURL
- http-completion-stream: Server-sent events (SSE) streaming chat completion via cURL
- http-transcription: Speech transcription via cURL
- http-transcription-verbose: Speech transcription with verbose output via cURL
rook-go examples
- go-sdk-completion: Basic chat completion via
rook-go - go-sdk-completion-stream: Streaming chat completion via
rook-go - go-sdk-transcription: Speech transcription via
rook-go
| Backend | Source | License | Runtime | Status |
|---|---|---|---|---|
| llama.cpp | backends/llamacpp |
MIT | CPU, CUDA | 🟡 Experimental |
| Backend | Source | License | Runtime | Status |
|---|---|---|---|---|
| whisper.cpp | backends/whispercpp |
MIT | CPU, CUDA | 🟡 Experimental |
| Backend | Source | License | Runtime | Status |
|---|---|---|---|---|
| Piper | backends/piper |
MIT | CPU | 🟡 Experimental |
Status legend:
- 🟢 Supported: tested, stable, and recommended for production.
- 🟡 Experimental: functional but subject to changes, bugs, or limitations.
- 🟠 Development: active integration with features still under construction.
- 🔴 Planned: intended for future implementation (PRs welcome).
Note
Additional backends (embeddings, vision, NLU, VAD, etc.) will be added as the API matures. The focus is on expanding support gradually, keeping the interface consistent and predictable.
git clone https://github.com/ju4n97/rook
cd rook
task install
# Install backends (this may take several minutes the first time)
task install-backends # CPU
# task install-backends-cuda # CUDA
task helpTaskfile.yaml is your guide.