♖ Rook

Important

Rook is under development. Interfaces may shift, including occasional breaking changes as the API settles.

Rook is a lightweight runtime that serves local AI models through a single HTTP/gRPC API. The focus is reliable local inference for the core capabilities most systems need: LLMs, STT, TTS, embeddings, vision, and NLU. Instead of supporting every possible backend, the goal is to keep the surface area small and consistent while the foundations feel solid.

It’s intended for offline assistants, on-device chat systems, embedded agents, CLI tools, and services that benefit from a unified abstraction instead of juggling separate runtimes.

Quick start

CPU

docker run -d -p 8080:8080 -p 50051:50051 --name rook ghcr.io/ju4n97/rook:latest-cpu

NVIDIA GPU

Requires NVIDIA Container Toolkit.

docker run -d -p 8080:8080 -p 50051:50051 --gpus all --name rook ghcr.io/ju4n97/rook:latest-cuda

Verify installation

curl http://localhost:8080/health

Custom models & configuration

To use your own models or customize configuration, mount local directories:

docker run -d \
  -p 8080:8080 \
  -p 50051:50051 \
  -v $(pwd)/rook.yaml:/home/rook/config/rook.yaml \
  -v $(pwd)/models:/home/rook/models \
  --name rook \
  ghcr.io/ju4n97/rook:latest

Configuration

Rook uses a rook.yaml file to define models to download and expose. Models are downloaded and cached automatically from the specified source on the first run.

# rook.yaml
version: "1"

# Models that rook will fetch and make available locally
models:
    qwen2.5-1.5b-instruct-q4_k_m:
        type: llm
        backend: llama.cpp
        source:
            huggingface:
                repo: Qwen/Qwen2.5-1.5B-Instruct-GGUF
                include: ["qwen2.5-1.5b-instruct-q4_k_m.gguf"]

    whisper-small:
        type: stt
        backend: whisper.cpp
        source:
            huggingface:
                repo: ggerganov/whisper.cpp
                include: ["ggml-small.bin"]
        tags: [multilingual, streaming]

    piper-es-ar-daniela:
        type: tts
        backend: piper
        source:
            huggingface:
                repo: rhasspy/piper-voices
                include: ["es/es_AR/daniela/high/*"]
        tags: [spanish, argentina, high-quality]

Environment variables

Variable	Description
`ROK_SERVER_HTTP_HOST`	HTTP server host
`ROK_SERVER_GRPC_HOST`	gRPC server host
`ROOK_SERVER_HTTP_PORT`	HTTP server port
`ROOK_SERVER_GRPC_PORT`	gRPC server port
`ROOK_MODELS_PATH`	Path to models directory
`ROOK_CONFIG_PATH`	Path to config file (`rook.yaml`)

Examples

Working demos can be found in the examples directory.

HTTP (raw API)

http-completion: Basic chat completion via cURL
http-completion-stream: Server-sent events (SSE) streaming chat completion via cURL
http-transcription: Speech transcription via cURL
http-transcription-verbose: Speech transcription with verbose output via cURL

`rook-go` examples

go-sdk-completion: Basic chat completion via rook-go
go-sdk-completion-stream: Streaming chat completion via rook-go
go-sdk-transcription: Speech transcription via rook-go

Supported backends

LLM

Backend	Source	License	Runtime	Status
llama.cpp	`backends/llamacpp`	MIT	CPU, CUDA	🟡 Experimental

STT

Backend	Source	License	Runtime	Status
whisper.cpp	`backends/whispercpp`	MIT	CPU, CUDA	🟡 Experimental

TTS

Backend	Source	License	Runtime	Status
Piper	`backends/piper`	MIT	CPU	🟡 Experimental

Status legend:

🟢 Supported: tested, stable, and recommended for production.
🟡 Experimental: functional but subject to changes, bugs, or limitations.
🟠 Development: active integration with features still under construction.
🔴 Planned: intended for future implementation (PRs welcome).

Note

Additional backends (embeddings, vision, NLU, VAD, etc.) will be added as the API matures. The focus is on expanding support gradually, keeping the interface consistent and predictable.

Local development

Requirements

git clone https://github.com/ju4n97/rook
cd rook

task install
# Install backends (this may take several minutes the first time)
task install-backends          # CPU
# task install-backends-cuda   # CUDA
task help

Taskfile.yaml is your guide.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github		.github
backends		backends
cmd/rook		cmd/rook
examples		examples
internal		internal
jsonschema		jsonschema
proto		proto
rook-api		rook-api
rook-go		rook-go
scripts		scripts
server		server
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.lefthook.yaml		.lefthook.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Taskfile.yaml		Taskfile.yaml
go.mod		go.mod
go.sum		go.sum
revive.toml		revive.toml
rook.default.yaml		rook.default.yaml
rook.example.yaml		rook.example.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

♖ Rook

Quick start

CPU

NVIDIA GPU

Verify installation

Custom models & configuration

Configuration

Environment variables

Examples

HTTP (raw API)

`rook-go` examples

Supported backends

LLM

STT

TTS

Local development

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ekisa-team/rook

Folders and files

Latest commit

History

Repository files navigation

♖ Rook

Quick start

CPU

NVIDIA GPU

Verify installation

Custom models & configuration

Configuration

Environment variables

Examples

HTTP (raw API)

rook-go examples

Supported backends

LLM

STT

TTS

Local development

Requirements

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`rook-go` examples

Packages