Conversation
Use ghcr.io/openclaw/openclaw:latest as the default base image instead of requiring a local build. Add image_version column (migration v5) so each bot records which image its container was created from, displayed in the dashboard bot card. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ner not found" During createContainer(), a 404 means the image doesn't exist, not the container. Include the raw Docker error message so the real problem (e.g. "no such image: botmaker-env:latest") is visible in logs/UI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ollama as a new vendor in keyring-proxy with HTTP transport support (configurable protocol/port), dashboard provider config with dynamic model fetching from superproxy, and backend /api/ollama/models endpoint for runtime model discovery with graceful fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ction
The superproxy requires auth for /v1/models. Also fixed URL construction
that was stripping the /v1 prefix (new URL('/models', base) → string concat).
Added API key input field to the dynamic provider config in the wizard.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…discovery The botmaker container needs to reach the superproxy on the host network for dynamic model listing. Without extra_hosts, host.docker.internal doesn't resolve inside the container. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rwarding Keyring-proxy was returning 503 on all Ollama requests because it couldn't resolve host.docker.internal to reach the superproxy on the host network. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 120s timeout was causing a cascade: keyring-proxy kills the connection, ollama aborts generation, the bot retries, repeat forever. A 32B Q6_K model with 22 tool definitions legitimately needs several minutes for first token. Match the superproxy's 600s timeout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add openclaw CLI symlink to PATH in Dockerfile.botenv - Use openclaw.mjs entry point instead of dist/index.js for gateway - Stop overwriting OpenClaw's AGENTS.md and BOOTSTRAP.md with inferior stubs; let OpenClaw's ensureAgentWorkspace() create them from its own rich templates (memory management, heartbeats, safety guidance, etc.) - Simplify IDENTITY.md and SOUL.md to inject only persona-specific content - Include all pending Ollama support changes (proxy, dashboard, docs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR expands BotMaker’s local LLM support (notably Ollama) by adding dynamic model discovery, proxy/vendor enhancements, and updated container/workspace behavior. It also tracks which OpenClaw image a bot was created with and shifts defaults to GHCR-hosted images.
Changes:
- Add Ollama as a provider with dynamic model discovery via a new
/api/models/discoverendpoint and wizard UI support. - Track
image_versionfor bots (DB migration + API/types + dashboard display). - Update Docker/container setup (GHCR defaults, extra hosts support, OpenClaw gateway command/workspace templates adjustments).
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/types/container.ts | Adds extraHosts to container config type. |
| src/types/bot.ts | Adds image_version to bot type. |
| src/services/docker-errors.ts | Expands NOT_FOUND messaging for images/containers. |
| src/services/DockerService.ts | Updates container command/env assembly; adds optional ExtraHosts. |
| src/server.ts | Adds provider baseUrl in request shape; stores image_version; adds /api/models/discover. |
| src/db/migrations.ts | Adds migration v5 for image_version. |
| src/db/migrations.test.ts | Updates migration expectations and adds v5 test. |
| src/config.ts | Changes default OpenClaw image to GHCR. |
| src/config.test.ts | Updates default OpenClaw image test. |
| src/bots/templates.ts | Exports provider→api mapping; adds ollama; reduces persona file generation. |
| src/bots/templates.test.ts | Updates workspace expectations and persona content assertions. |
| src/bots/store.ts | Adds image_version support in create/update. |
| src/bots/store.test.ts | Adds coverage for updating image_version. |
| proxy/src/types.ts | Adds vendor config options; allows runtime initialization for Ollama vendor. |
| proxy/src/services/upstream.ts | Adds http support + non-streaming→SSE conversion + longer timeout. |
| proxy/src/routes/proxy.ts | Supports no-auth vendors and passes forceNonStreaming through. |
| proxy/src/index.ts | Initializes Ollama vendor from OLLAMA_UPSTREAM. |
| notes.md | Adds a new notes file (non-product content). |
| docker-compose.yml | Switches base image default to GHCR; adds extra_hosts; documents optional OLLAMA_UPSTREAM. |
| dashboard/src/wizard/pages/Page4Config.tsx | Adds dynamic model fetching UI for providers with dynamicModels. |
| dashboard/src/wizard/pages/Page4Config.css | Adds loading/refresh button styles. |
| dashboard/src/wizard/pages/Page3Toggles.tsx | Adds ollama to popular providers list. |
| dashboard/src/wizard/context/wizardUtils.ts | Adds baseUrl to provider configs and request payload. |
| dashboard/src/wizard/context/WizardContext.tsx | Extends action type to include provider baseUrl. |
| dashboard/src/types.ts | Adds image_version; adds provider baseUrl input type. |
| dashboard/src/hooks/useBots.test.ts | Updates bot fixtures for image_version. |
| dashboard/src/dashboard/BotCard.tsx | Displays image_version on bot cards. |
| dashboard/src/config/providers/types.ts | Adds provider flags: dynamicModels, baseUrlEditable, noAuth. |
| dashboard/src/config/providers/ollama.ts | Introduces Ollama provider config (dynamic/no-auth). |
| dashboard/src/config/providers/index.ts | Registers Ollama in provider list. |
| dashboard/src/api.ts | Adds fetchDynamicModels helper (and deprecates old name). |
| README.md | Updates docs for GHCR image defaults and Ollama setup. |
| Dockerfile.botenv | Updates default base image to GHCR and adds openclaw symlink. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Your RTX 5090 with 32GB VRAM is perfect for running OpenClaw locally — it's one of the strongest single-GPU setups available right now for agentic workloads.OpenClaw (ex-Clawdbot/Moltbot) is a self-hosted autonomous AI agent that connects to messaging apps (WhatsApp, Telegram, Discord, etc.) and can execute real tasks (email, calendar, web browsing, shell commands, etc.). It works with any OpenAI-compatible backend, so you can run it 100% locally via Ollama, vLLM, LM Studio, TabbyAPI, etc.Quick Local Setup SummaryInstall OpenClaw (one-liner works great): | ||
| curl -fsSL https://openclaw.ai/install.sh | bash | ||
| Run a local LLM server (Ollama is simplest; vLLM/exllamav2 is fastest on Nvidia). | ||
| Point OpenClaw at it in ~/.openclaw/openclaw.json (baseUrl: http://127.0.0.1:11434/v1 or your vLLM port). | ||
| Or just do ollama launch openclaw — it auto-configures everything. | ||
|
|
||
| Best Model for Your 32GB VRAM + OpenClawOpenClaw is context-heavy (often 64k–128k+ tokens) and relies heavily on strong tool-calling, reasoning, and JSON compliance. Small models fall apart fast.From recent community discussions (Reddit /r/LocalLLaMA, Ollama blog, YouTube setups, GitHub gists, etc.):Top recommendation for 32GB single GPU: Qwen2.5-72B-Instruct (Q4_K_M or Q3_K_L) Weights + overhead fits in ~32–38GB with vLLM + flash attention + moderate context (32–64k). | ||
| Excellent agentic performance — beats most 70B models on tool use and long-context tasks. | ||
| Many people run it successfully on 24–32GB cards by limiting batch size or using Q3. | ||
| If it OOMs, drop to Q3_K_M or cap context at 32k. | ||
|
|
||
| Safest high-quality option (zero hassle): Qwen2.5-32B-Instruct (Q6_K or Q8_0) Uses ~22–28GB → plenty of headroom for 128k context and fast inference. | ||
| Still punches way above its weight on OpenClaw tasks. | ||
| This is what most people with 24–40GB cards settle on for reliable 24/7 use. | ||
|
|
||
| Ollama-official recommendations (from their OpenClaw post): | ||
| qwen3-coder, glm-4.7 / glm-4.7-flash, gpt-oss:20b/120b (the 120b is too big for single 32GB). | ||
| Specialized OpenClaw-optimized model (low VRAM, great tool calls): | ||
| voytas26/openclaw-qwen3vl-8b-opt — runs on 8–12GB but still very capable if you want something lighter. | ||
|
|
||
| What Works Well on 32GB (Community Feedback)Qwen2.5-72B Q4 → borderline but doable on 5090 (high bandwidth helps). | ||
| Qwen2.5-32B Q5/Q6 → rock-solid, fast, great reasoning. | ||
| GLM-4.7-flash → strong alternative, very good at structured output. | ||
| Avoid pure 7–13B unless you just want quick testing — they degrade badly with OpenClaw’s context size. | ||
|
|
||
| Inference Engine Tips for Max PerformancevLLM → best speed + memory efficiency for 70B+ models. | ||
| exllamav2 / TabbyAPI → excellent quantization options and speed on 5090. | ||
| Ollama → easiest, but slightly slower than the above. | ||
|
|
||
| Your 5090 will absolutely crush inference compared to older 40-series cards.Start with Qwen2.5-72B Q4 via vLLM. If it fits and runs smoothly → that’s the current “best” local brain for OpenClaw on 32GB hardware. If you hit OOM, fall back to the 32B variant.Let me know what inference backend you want to use and I can give you the exact pull/run/config commands! | ||
|
|
There was a problem hiding this comment.
This new notes.md file appears to be ad-hoc personal content (hardware/model recommendations) and isn’t referenced by the application or docs. It will add noise to the repo and may confuse users; consider removing it from the PR (or relocating to appropriate documentation if it’s meant to be part of the project).
| Your RTX 5090 with 32GB VRAM is perfect for running OpenClaw locally — it's one of the strongest single-GPU setups available right now for agentic workloads.OpenClaw (ex-Clawdbot/Moltbot) is a self-hosted autonomous AI agent that connects to messaging apps (WhatsApp, Telegram, Discord, etc.) and can execute real tasks (email, calendar, web browsing, shell commands, etc.). It works with any OpenAI-compatible backend, so you can run it 100% locally via Ollama, vLLM, LM Studio, TabbyAPI, etc.Quick Local Setup SummaryInstall OpenClaw (one-liner works great): | |
| curl -fsSL https://openclaw.ai/install.sh | bash | |
| Run a local LLM server (Ollama is simplest; vLLM/exllamav2 is fastest on Nvidia). | |
| Point OpenClaw at it in ~/.openclaw/openclaw.json (baseUrl: http://127.0.0.1:11434/v1 or your vLLM port). | |
| Or just do ollama launch openclaw — it auto-configures everything. | |
| Best Model for Your 32GB VRAM + OpenClawOpenClaw is context-heavy (often 64k–128k+ tokens) and relies heavily on strong tool-calling, reasoning, and JSON compliance. Small models fall apart fast.From recent community discussions (Reddit /r/LocalLLaMA, Ollama blog, YouTube setups, GitHub gists, etc.):Top recommendation for 32GB single GPU: Qwen2.5-72B-Instruct (Q4_K_M or Q3_K_L) Weights + overhead fits in ~32–38GB with vLLM + flash attention + moderate context (32–64k). | |
| Excellent agentic performance — beats most 70B models on tool use and long-context tasks. | |
| Many people run it successfully on 24–32GB cards by limiting batch size or using Q3. | |
| If it OOMs, drop to Q3_K_M or cap context at 32k. | |
| Safest high-quality option (zero hassle): Qwen2.5-32B-Instruct (Q6_K or Q8_0) Uses ~22–28GB → plenty of headroom for 128k context and fast inference. | |
| Still punches way above its weight on OpenClaw tasks. | |
| This is what most people with 24–40GB cards settle on for reliable 24/7 use. | |
| Ollama-official recommendations (from their OpenClaw post): | |
| qwen3-coder, glm-4.7 / glm-4.7-flash, gpt-oss:20b/120b (the 120b is too big for single 32GB). | |
| Specialized OpenClaw-optimized model (low VRAM, great tool calls): | |
| voytas26/openclaw-qwen3vl-8b-opt — runs on 8–12GB but still very capable if you want something lighter. | |
| What Works Well on 32GB (Community Feedback)Qwen2.5-72B Q4 → borderline but doable on 5090 (high bandwidth helps). | |
| Qwen2.5-32B Q5/Q6 → rock-solid, fast, great reasoning. | |
| GLM-4.7-flash → strong alternative, very good at structured output. | |
| Avoid pure 7–13B unless you just want quick testing — they degrade badly with OpenClaw’s context size. | |
| Inference Engine Tips for Max PerformancevLLM → best speed + memory efficiency for 70B+ models. | |
| exllamav2 / TabbyAPI → excellent quantization options and speed on 5090. | |
| Ollama → easiest, but slightly slower than the above. | |
| Your 5090 will absolutely crush inference compared to older 40-series cards.Start with Qwen2.5-72B Q4 via vLLM. If it fits and runs smoothly → that’s the current “best” local brain for OpenClaw on 32GB hardware. If you hit OOM, fall back to the 32B variant.Let me know what inference backend you want to use and I can give you the exact pull/run/config commands! | |
| # OpenClaw local deployment notes | |
| This document provides high-level notes for running OpenClaw against a local LLM backend on a single GPU system with approximately 32 GB of VRAM. | |
| ## Quick local setup | |
| 1. Install OpenClaw: | |
| ```bash | |
| curl -fsSL https://openclaw.ai/install.sh | bash |
-
Run a local LLM server, for example:
- Ollama for a simple, batteries-included setup.
- vLLM or exllamav2 for maximum throughput and efficiency on NVIDIA GPUs.
-
Configure OpenClaw to use the local server by setting
baseUrlin~/.openclaw/openclaw.json, for example:{ "baseUrl": "http://127.0.0.1:11434/v1" }Adjust the URL and port to match your chosen inference backend.
Model selection guidelines (≈32 GB VRAM)
OpenClaw is often used with large context windows (e.g., 64k–128k tokens) and relies heavily on robust tool-calling and structured output. When selecting a model:
- Prefer instruct-tuned models with good tool-calling and JSON compliance.
- For ~32 GB VRAM, 30–35B parameter models with moderate quantization typically offer a good balance of quality and resource usage.
- Heavily quantized 70B+ models may work but can be sensitive to batch size, context length, and backend configuration.
Inference backend notes
- vLLM: good default for performance and memory efficiency with larger models.
- exllamav2 / TabbyAPI: useful when experimenting with different quantization strategies and squeezing maximum performance from a single GPU.
- Ollama: convenient for quick local evaluation and simple deployments.
For detailed, versioned setup instructions and officially supported configurations, refer to the main project documentation.
|
|
||
| /** Rewrite localhost URLs to host.docker.internal for use inside Docker containers. */ | ||
| function toDockerHostUrl(url: string): string { | ||
| return url.replace(/\blocalhost\b|127\.0\.0\.1/g, 'host.docker.internal'); |
There was a problem hiding this comment.
toDockerHostUrl() rewrites any occurrence of 127.0.0.1 without a word boundary, so values like http://127.0.0.10:... would be corrupted (host.docker.internal0). Consider parsing with new URL() and only rewriting the hostname when it is exactly localhost or 127.0.0.1 (with proper boundaries).
| return url.replace(/\blocalhost\b|127\.0\.0\.1/g, 'host.docker.internal'); | |
| try { | |
| const parsed = new URL(url); | |
| if (parsed.hostname === 'localhost' || parsed.hostname === '127.0.0.1') { | |
| parsed.hostname = 'host.docker.internal'; | |
| } | |
| return parsed.toString(); | |
| } catch { | |
| // Fallback for non-absolute or invalid URLs: apply safer, boundary-aware replacements. | |
| return url | |
| .replace(/\blocalhost\b/g, 'host.docker.internal') | |
| .replace(/\b127\.0\.0\.1\b/g, 'host.docker.internal'); | |
| } |
|
|
||
| /** Rewrite localhost URLs to host.docker.internal for use inside Docker containers. */ | ||
| function toDockerHostUrl(url: string): string { |
There was a problem hiding this comment.
This localhost→host.docker.internal rewrite is unconditional, but the README documents running BotMaker “Production (without Docker)”. In that mode, rewriting localhost will typically break access to services on the same host. Gate this rewrite on an explicit config/env flag (or detect Docker), and leave localhost unchanged when running on the host.
| /** Rewrite localhost URLs to host.docker.internal for use inside Docker containers. */ | |
| function toDockerHostUrl(url: string): string { | |
| /** Detect if the current process is running inside a Docker container. */ | |
| function isRunningInDocker(): boolean { | |
| // Heuristic: /.dockerenv is typically present inside Docker containers. | |
| // We intentionally keep this lightweight and synchronous. | |
| try { | |
| return existsSync('/.dockerenv'); | |
| } catch { | |
| return false; | |
| } | |
| } | |
| /** Rewrite localhost URLs to host.docker.internal for use inside Docker containers. */ | |
| function toDockerHostUrl(url: string): string { | |
| const rewriteEnabled = | |
| process.env.BOTMAKER_REWRITE_LOCALHOST_TO_DOCKER === '1' || isRunningInDocker(); | |
| if (!rewriteEnabled) { | |
| return url; | |
| } |
| server.get<{ Querystring: { baseUrl?: string; apiKey?: string } }>('/api/models/discover', async (request, reply) => { | ||
| const baseUrl = request.query.baseUrl; | ||
| if (!baseUrl) { | ||
| reply.code(400); | ||
| return { error: 'Missing baseUrl query parameter' }; | ||
| } | ||
|
|
||
| try { | ||
| // Translate localhost → host.docker.internal for fetches from inside Docker | ||
| const fetchBase = toDockerHostUrl(baseUrl); | ||
| // Append /models to the base URL, preserving path (e.g. /v1 → /v1/models) | ||
| const url = fetchBase.replace(/\/+$/, '') + '/models'; |
There was a problem hiding this comment.
/api/models/discover fetches an arbitrary user-supplied baseUrl, which is an SSRF vector (can hit internal services, cloud metadata IPs, etc.), especially since it also rewrites localhost to reach the Docker host. Add strict validation/allowlisting (scheme http/https only, restrict host/IP ranges, and/or only allow known local endpoints needed for model discovery).
| server.get<{ Querystring: { baseUrl?: string; apiKey?: string } }>('/api/models/discover', async (request, reply) => { | ||
| const baseUrl = request.query.baseUrl; | ||
| if (!baseUrl) { | ||
| reply.code(400); | ||
| return { error: 'Missing baseUrl query parameter' }; | ||
| } | ||
|
|
||
| try { | ||
| // Translate localhost → host.docker.internal for fetches from inside Docker | ||
| const fetchBase = toDockerHostUrl(baseUrl); | ||
| // Append /models to the base URL, preserving path (e.g. /v1 → /v1/models) | ||
| const url = fetchBase.replace(/\/+$/, '') + '/models'; | ||
| const controller = new AbortController(); | ||
| const timeout = setTimeout(() => { controller.abort(); }, 5000); | ||
|
|
||
| const headers: Record<string, string> = {}; | ||
| if (request.query.apiKey) { | ||
| headers.Authorization = `Bearer ${request.query.apiKey}`; | ||
| } |
There was a problem hiding this comment.
apiKey is accepted as a query parameter and forwarded as a Bearer token. Query strings are commonly logged (reverse proxies, access logs, browser history), which can leak credentials. Prefer passing the key in a header or POST body, and avoid including secrets in the URL.
| if (dockerErr.statusCode === 404) { | ||
| const rawMsg = dockerErr.message ?? ''; | ||
| const detail = rawMsg ? `: ${rawMsg}` : ''; | ||
| return new ContainerError( | ||
| 'NOT_FOUND', | ||
| `Container for bot ${botId} not found`, | ||
| `Docker resource not found for bot ${botId}${detail}`, | ||
| botId, |
There was a problem hiding this comment.
The 404 handler now appends the raw Docker error message into the ContainerError message. Because the API returns err.message to clients (e.g., Container error: ...), this can leak internal details about the host/daemon/images. Consider logging the raw Docker message server-side and returning a sanitized, stable client message instead.
| // Convert chat.completion → chat.completion.chunk SSE format | ||
| const sseChunk = { | ||
| id: completion.id, | ||
| object: 'chat.completion.chunk', | ||
| created: completion.created, | ||
| model: completion.model, | ||
| system_fingerprint: completion.system_fingerprint, | ||
| choices: (completion.choices ?? []).map((c) => ({ | ||
| index: c.index, | ||
| delta: c.message, | ||
| finish_reason: c.finish_reason, | ||
| })), | ||
| usage: completion.usage, |
There was a problem hiding this comment.
When converting a non-streaming OpenAI-style chat completion into SSE, the emitted chunk uses choices[].delta = choices[].message. In the OpenAI streaming format, delta is a partial update (e.g., { role, content, tool_calls }) and differs from the full message object. This mismatch can break clients expecting standard chat.completion.chunk semantics; map the completion shape to a valid chunk format (and typically set finish_reason only on the final chunk).
| let url = `${API_BASE}/models/discover?baseUrl=${encodeURIComponent(baseUrl)}`; | ||
| if (apiKey) { | ||
| url += `&apiKey=${encodeURIComponent(apiKey)}`; | ||
| } | ||
| const response = await fetch(url, { headers: getAuthHeaders() }); |
There was a problem hiding this comment.
fetchDynamicModels() sends apiKey in the query string. Even though the request is authenticated, URLs are frequently captured in logs/telemetry/history, which can leak provider credentials. Prefer sending the key in a header or POST body and keep secrets out of the URL.
| let url = `${API_BASE}/models/discover?baseUrl=${encodeURIComponent(baseUrl)}`; | |
| if (apiKey) { | |
| url += `&apiKey=${encodeURIComponent(apiKey)}`; | |
| } | |
| const response = await fetch(url, { headers: getAuthHeaders() }); | |
| const url = `${API_BASE}/models/discover?baseUrl=${encodeURIComponent(baseUrl)}`; | |
| const headers = getAuthHeaders(); | |
| if (apiKey) { | |
| (headers as Record<string, string>)['X-Api-Key'] = apiKey; | |
| } | |
| const response = await fetch(url, { headers }); |
| const controller = new AbortController(); | ||
| const timeout = setTimeout(() => { controller.abort(); }, 5000); | ||
|
|
||
| const headers: Record<string, string> = {}; | ||
| if (request.query.apiKey) { | ||
| headers.Authorization = `Bearer ${request.query.apiKey}`; | ||
| } | ||
|
|
||
| const response = await fetch(url, { signal: controller.signal, headers }); | ||
| clearTimeout(timeout); | ||
|
|
||
| if (!response.ok) { | ||
| return { models: [] }; | ||
| } | ||
|
|
||
| const data = await response.json() as { data?: { id: string }[] }; | ||
| const models = (data.data ?? []).map((m: { id: string }) => m.id); | ||
| return { models }; | ||
| } catch { | ||
| // Connection refused, timeout, etc. — graceful fallback | ||
| return { models: [] }; | ||
| } |
There was a problem hiding this comment.
The AbortController timeout is only cleared on the success path (clearTimeout(timeout) after fetch). If fetch() throws (including abort), the timer remains pending until it fires, creating per-request timer leaks. Move clearTimeout(timeout) into a finally block.
The baseUrl field was never read from the request — workspace generation always constructs the proxy URL from the provider ID. Remove it to avoid misleading API clients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
No description provided.