Skip to content

ekisa-team/rook

Repository files navigation

♖ Rook

CI Release Version Go Report Card go.dev reference

Important

Rook is under development. Interfaces may shift, including occasional breaking changes as the API settles.

Rook is a lightweight runtime that serves local AI models through a single HTTP/gRPC API. The focus is reliable local inference for the core capabilities most systems need: LLMs, STT, TTS, embeddings, vision, and NLU. Instead of supporting every possible backend, the goal is to keep the surface area small and consistent while the foundations feel solid.

It’s intended for offline assistants, on-device chat systems, embedded agents, CLI tools, and services that benefit from a unified abstraction instead of juggling separate runtimes.

Quick start

CPU

docker run -d -p 8080:8080 -p 50051:50051 --name rook ghcr.io/ju4n97/rook:latest-cpu

NVIDIA GPU

Requires NVIDIA Container Toolkit.

docker run -d -p 8080:8080 -p 50051:50051 --gpus all --name rook ghcr.io/ju4n97/rook:latest-cuda

Verify installation

curl http://localhost:8080/health

Custom models & configuration

To use your own models or customize configuration, mount local directories:

docker run -d \
  -p 8080:8080 \
  -p 50051:50051 \
  -v $(pwd)/rook.yaml:/home/rook/config/rook.yaml \
  -v $(pwd)/models:/home/rook/models \
  --name rook \
  ghcr.io/ju4n97/rook:latest

Configuration

Rook uses a rook.yaml file to define models to download and expose. Models are downloaded and cached automatically from the specified source on the first run.

# rook.yaml
version: "1"

# Models that rook will fetch and make available locally
models:
    qwen2.5-1.5b-instruct-q4_k_m:
        type: llm
        backend: llama.cpp
        source:
            huggingface:
                repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
                include: ["qwen2.5-1.5b-instruct-q4_k_m.gguf"]

    whisper-small:
        type: stt
        backend: whisper.cpp
        source:
            huggingface:
                repo: ggerganov/whisper.cpp
                include: ["ggml-small.bin"]
        tags: [multilingual, streaming]

    piper-es-ar-daniela:
        type: tts
        backend: piper
        source:
            huggingface:
                repo: rhasspy/piper-voices
                include: ["es/es_AR/daniela/high/*"]
        tags: [spanish, argentina, high-quality]

Environment variables

Variable Description
ROK_SERVER_HTTP_HOST HTTP server host
ROK_SERVER_GRPC_HOST gRPC server host
ROOK_SERVER_HTTP_PORT HTTP server port
ROOK_SERVER_GRPC_PORT gRPC server port
ROOK_MODELS_PATH Path to models directory
ROOK_CONFIG_PATH Path to config file (rook.yaml)

Examples

Working demos can be found in the examples directory.

HTTP (raw API)

rook-go examples

Supported backends

LLM

Backend Source License Runtime Status
llama.cpp backends/llamacpp MIT CPU, CUDA 🟡 Experimental

STT

Backend Source License Runtime Status
whisper.cpp backends/whispercpp MIT CPU, CUDA 🟡 Experimental

TTS

Backend Source License Runtime Status
Piper backends/piper MIT CPU 🟡 Experimental

Status legend:

  • 🟢 Supported: tested, stable, and recommended for production.
  • 🟡 Experimental: functional but subject to changes, bugs, or limitations.
  • 🟠 Development: active integration with features still under construction.
  • 🔴 Planned: intended for future implementation (PRs welcome).

Note

Additional backends (embeddings, vision, NLU, VAD, etc.) will be added as the API matures. The focus is on expanding support gradually, keeping the interface consistent and predictable.

Local development

Requirements

git clone https://github.com/ju4n97/rook
cd rook

task install
# Install backends (this may take several minutes the first time)
task install-backends          # CPU
# task install-backends-cuda   # CUDA
task help

Taskfile.yaml is your guide.

License

MIT

About

Self-hosted multimodal AI runtime with unified API.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published