Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion docs/hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Hooks fire synchronously during the agent loop and can:
| `BeforeLLMCall` | Before each LLM API call | `Messages`, `TaskID`, `CorrelationID` |
| `AfterLLMCall` | After each LLM API call | `Messages`, `Response`, `TaskID`, `CorrelationID` |
| `BeforeToolExec` | Before each tool execution | `ToolName`, `ToolInput`, `TaskID`, `CorrelationID` |
| `AfterToolExec` | After each tool execution | `ToolName`, `ToolInput`, `ToolOutput`, `Error`, `TaskID`, `CorrelationID` |
| `AfterToolExec` | After each tool execution | `ToolName`, `ToolInput`, `ToolOutput` (mutable), `Error`, `TaskID`, `CorrelationID` |
| `OnError` | When an LLM call fails | `Error`, `TaskID`, `CorrelationID` |
| `OnProgress` | During tool execution | `Phase`, `ToolName`, `StatusMessage` |

Expand Down Expand Up @@ -73,6 +73,19 @@ hooks.Register(engine.BeforeToolExec, func(ctx context.Context, hctx *engine.Hoo
})
```

## Output Redaction

`AfterToolExec` hooks can modify `hctx.ToolOutput` to redact sensitive content before it enters the LLM context. The agent loop reads back `ToolOutput` from the `HookContext` after all hooks fire.

The runner registers a guardrail hook that scans tool output for secrets and PII patterns. See [Tool Output Scanning](security/guardrails.md#tool-output-scanning) for details.

```go
hooks.Register(engine.AfterToolExec, func(ctx context.Context, hctx *engine.HookContext) error {
hctx.ToolOutput = strings.ReplaceAll(hctx.ToolOutput, secret, "[REDACTED]")
return nil
})
```

## Audit Logging

The runner registers `AfterLLMCall` hooks that emit structured audit events for each LLM interaction. Audit fields include:
Expand Down
14 changes: 14 additions & 0 deletions docs/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,18 @@ memory:
- Sessions are saved as JSON files with atomic writes (temp file + fsync + rename)
- Automatic cleanup of sessions older than 7 days at startup
- Session recovery on subsequent requests (disk snapshot supersedes task history)
- **Session max age** (default 30 minutes): stale sessions are discarded on recovery to prevent poisoned error context from blocking tool retries. When an LLM accumulates repeated tool failures in a session, it may stop retrying altogether. The max age ensures these poisoned sessions expire, giving the agent a fresh start.

Configure via `forge.yaml` or environment variable:

```yaml
memory:
session_max_age: "30m" # default; use "1h", "15m", etc.
```

```bash
export FORGE_SESSION_MAX_AGE=1h
```

## Context Window Management

Expand Down Expand Up @@ -89,6 +101,7 @@ Full memory configuration in `forge.yaml`:
memory:
persistence: true
sessions_dir: ".forge/sessions"
session_max_age: "30m" # discard sessions idle longer than this
char_budget: 200000
trigger_ratio: 0.6
long_term: false
Expand All @@ -105,6 +118,7 @@ Environment variables:
| Variable | Description |
|----------|-------------|
| `FORGE_MEMORY_PERSISTENCE` | Set `false` to disable session persistence |
| `FORGE_SESSION_MAX_AGE` | Session idle timeout, e.g. `30m`, `1h` (default: `30m`) |
| `FORGE_MEMORY_LONG_TERM` | Set `true` to enable long-term memory |
| `FORGE_EMBEDDING_PROVIDER` | Override embedding provider |

Expand Down
5 changes: 4 additions & 1 deletion docs/runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,8 @@ forge run --host 0.0.0.0 --shutdown-timeout 30s
| `--model` | — | Override model name |
| `--provider` | — | Override LLM provider |
| `--env` | `.env` | Path to env file |
| `--enforce-guardrails` | `false` | Enforce guardrail violations as errors |
| `--enforce-guardrails` | `true` | Enforce guardrail violations as errors |
| `--no-guardrails` | `false` | Disable all guardrail enforcement |

### `forge serve` — Background Daemon

Expand Down Expand Up @@ -202,6 +203,8 @@ For details on session persistence, context window management, compaction, and l

The engine fires hooks at key points in the loop. See [Hooks](hooks.md) for details.

The runner registers four hook groups: logging, audit, progress, and guardrail hooks. The guardrail `AfterToolExec` hook scans tool output for secrets and PII, redacting or blocking before results enter the LLM context. See [Tool Output Scanning](security/guardrails.md#tool-output-scanning).

## Streaming

The current implementation (v1) runs the full tool-calling loop non-streaming. `ExecuteStream` calls `Execute` internally and emits the final response as a single message on a channel. True word-by-word streaming during tool loops is planned for v2.
Expand Down
66 changes: 62 additions & 4 deletions docs/security/guardrails.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ The guardrail engine checks inbound and outbound messages against configurable p
| `content_filter` | Inbound + Outbound | Blocks messages containing configured blocked words |
| `no_pii` | Outbound | Detects email addresses, phone numbers, and SSNs via regex |
| `jailbreak_protection` | Inbound | Detects common jailbreak phrases ("ignore previous instructions", etc.) |
| `no_secrets` | Outbound | Detects API keys, tokens, and private keys (OpenAI, Anthropic, AWS, GitHub, Slack, Telegram, etc.) |

## Modes

Expand All @@ -37,6 +38,9 @@ Custom guardrail rules can be added to the policy scaffold:
},
"jailbreak_protection": {
"mode": "warn"
},
"no_secrets": {
"mode": "enforce"
}
}
}
Expand All @@ -45,13 +49,67 @@ Custom guardrail rules can be added to the policy scaffold:
## Runtime

```bash
# Run with guardrails enforced
forge run --enforce-guardrails

# Default: warn mode (log only)
# Default: guardrails enforced (all built-in guardrails active)
forge run

# Explicitly disable guardrail enforcement
forge run --no-guardrails
```

All four built-in guardrails (`content_filter`, `no_pii`, `jailbreak_protection`, `no_secrets`) are active by default, even without running `forge build`. Use `--no-guardrails` to opt out.

## Tool Output Scanning

The guardrail engine scans tool output via an `AfterToolExec` hook, catching secrets and PII before they enter the LLM context or outbound messages.

| Guardrail | What it detects in tool output |
|-----------|-------------------------------|
| `no_secrets` | API keys, tokens, private keys (same patterns as outbound message scanning) |
| `no_pii` | Email addresses, phone numbers, SSNs |

**Behavior by mode:**

| Mode | Behavior |
|------|----------|
| `enforce` | Returns a generic error (`"tool output blocked by content policy"`), blocking the result from entering the LLM context. The error message intentionally omits which guardrail matched to avoid leaking security internals to the LLM or channel. |
| `warn` | Replaces matched patterns with `[REDACTED]`, logs a warning, and allows the redacted output through |

The hook writes the redacted text back to `HookContext.ToolOutput`, which the agent loop reads after all hooks fire. This is backwards-compatible — existing hooks that don't modify `ToolOutput` leave it unchanged.

## Path Containment

The `cli_execute` tool confines filesystem path arguments to the agent's working directory. This prevents social-engineering attacks where an LLM is tricked into listing or reading files outside the project.

### Shell Interpreter Denylist

Shell interpreters (`bash`, `sh`, `zsh`, `dash`, `ksh`, `csh`, `tcsh`, `fish`) are **unconditionally blocked**, even if they appear in `allowed_binaries`. Shells defeat the no-shell `exec.Command` security model by reintroducing argument interpretation and bypassing all path validation (e.g., `bash -c "ls ~/Library/Keychains"`).

### HOME Override

When `workDir` is configured, `$HOME` in the subprocess environment is overridden to `workDir`. This prevents `~` expansion inside subprocesses from reaching the real home directory.

### Path Argument Validation

**Rules:**
- Arguments that look like paths (`/`, `~/`, `./`, `../`) are resolved and checked
- If a resolved path is inside `$HOME` but outside `workDir` → **blocked**
- System paths outside `$HOME` (e.g., `/tmp`, `/etc`) → allowed
- Non-path arguments (e.g., `get`, `pods`, `--namespace=default`) → allowed
- Flag arguments (e.g., `--kubeconfig=~/.kube/config`) → not detected as paths, allowed

Additionally, `cmd.Dir` is set to `workDir` so relative paths in subprocess execution resolve within the agent directory.

**Examples:**

| Command | Result |
|---------|--------|
| `kubectl get pods` | Allowed — no path args |
| `bash -c "ls ~/"` | Blocked — `bash` is a denied shell interpreter |
| `ls ~/Library/Keychains/` | Blocked — inside `$HOME`, outside workDir |
| `cat ../../.ssh/id_rsa` | Blocked — resolves inside `$HOME`, outside workDir |
| `jq '.' /tmp/data.json` | Allowed — system path outside `$HOME` |
| `ls ./data/` | Allowed — within workDir |

## Audit Events

Guardrail evaluations are logged as structured audit events:
Expand Down
32 changes: 32 additions & 0 deletions docs/skills.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ forge skills list --tags kubernetes,incident-response
| `tavily-search` | 🔍 | research | Search the web using Tavily AI search API | `tavily-search.sh` |
| `tavily-research` | 🔬 | research | Deep multi-source research via Tavily API | `tavily-research.sh`, `tavily-research-poll.sh` |
| `k8s-incident-triage` | ☸️ | sre | Read-only Kubernetes incident triage using kubectl | — (binary-backed) |
| `k8s-cost-visibility` | 💰 | sre | Estimate K8s infrastructure costs (compute, storage, LoadBalancer) with cost attribution reports | `k8s-cost-visibility.sh` |
| `k8s-pod-rightsizer` | ⚖️ | sre | Analyze workload metrics and produce CPU/memory rightsizing recommendations | — (binary-backed) |
| `code-review` | 🔎 | developer | AI-powered code review for diffs and files | `code-review-diff.sh`, `code-review-file.sh` |
| `code-review-standards` | 📏 | developer | Initialize and manage code review standards | — (template-based) |
Expand Down Expand Up @@ -265,6 +266,37 @@ This skill operates in three modes:

Requires: `bash`, `kubectl`, `jq`, `curl`. Optional: `KUBECONFIG`, `K8S_API_DOMAIN`, `PROMETHEUS_URL`, `PROMETHEUS_TOKEN`, `POLICY_FILE`, `DEFAULT_NAMESPACE`.

### Kubernetes Cost Visibility Skill

The `k8s-cost-visibility` skill estimates Kubernetes infrastructure costs by querying cluster node, pod, PVC/PV, and LoadBalancer data via `kubectl`, applying cloud pricing models, and producing cost attribution reports:

```bash
forge skills add k8s-cost-visibility
```

This registers a single tool:

| Tool | Purpose | Behavior |
|------|---------|----------|
| `k8s_cost_visibility` | Estimate cluster costs and produce attribution reports | Queries nodes, pods, PVCs, PVs, and services; applies pricing; returns cost breakdown |

**Cost dimensions tracked:**

| Dimension | Source | Default Rate |
|-----------|--------|-------------|
| Compute (CPU + memory) | Node instance types, pod resource requests | Auto-detected from cloud CLI or $0.031611/vCPU-hr |
| Storage (PVC/PV) | PVC capacities, storage classes | $0.10/GiB/month |
| LoadBalancer | Services with `type: LoadBalancer` | $18.25/month each |
| Waste | Unbound Persistent Volumes | Flagged with estimated monthly waste |

**Grouping modes:** `namespace` (includes storage + LB columns), `workload`, `node`, `label:<key>`, `annotation:<key>`.

**Pricing modes:** `auto` (detect cloud CLI), `aws`, `gcp`, `azure`, `static` (built-in rates), `custom:<file.json>` (user-provided rates).

**Safety:** This skill is strictly read-only. It only uses `kubectl get` commands (nodes, pods, pvc, pv, svc) — never `apply`, `delete`, `patch`, `exec`, or `scale`.

Requires: `kubectl`, `jq`, `awk`, `bc`. Optional: `KUBECONFIG`, `K8S_API_DOMAIN`, `DEFAULT_NAMESPACE`, `AWS_REGION`, `AZURE_SUBSCRIPTION_ID`, `GCP_PROJECT`.

### Codegen React Skill

The `codegen-react` skill scaffolds and iterates on **Vite + React** applications with Tailwind CSS:
Expand Down
19 changes: 11 additions & 8 deletions docs/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Provider selection: `WEB_SEARCH_PROVIDER` env var, or auto-detect from available

## CLI Execute

The `cli_execute` tool provides security-hardened command execution with 7 security layers:
The `cli_execute` tool provides security-hardened command execution with 10 security layers:

```yaml
tools:
Expand All @@ -73,13 +73,16 @@ tools:

| # | Layer | Detail |
|---|-------|--------|
| 1 | **Binary allowlist** | Only pre-approved binaries can execute |
| 2 | **Binary resolution** | Binaries are resolved to absolute paths via `exec.LookPath` at startup |
| 3 | **Argument validation** | Rejects arguments containing `$(`, backticks, or newlines |
| 4 | **Timeout** | Configurable per-command timeout (default: 120s) |
| 5 | **No shell** | Uses `exec.CommandContext` directly — no shell expansion |
| 6 | **Environment isolation** | Only `PATH`, `HOME`, `LANG`, explicit passthrough vars, proxy vars, and `OPENAI_ORG_ID` (when set) |
| 7 | **Output limits** | Configurable max output size (default: 1MB) to prevent memory exhaustion |
| 1 | **Shell denylist** | Shell interpreters (`bash`, `sh`, `zsh`, `dash`, `ksh`, `csh`, `tcsh`, `fish`) are unconditionally blocked — they defeat the no-shell design |
| 2 | **Binary allowlist** | Only pre-approved binaries can execute |
| 3 | **Binary resolution** | Binaries are resolved to absolute paths via `exec.LookPath` at startup |
| 4 | **Argument validation** | Rejects arguments containing `$(`, backticks, or newlines |
| 5 | **Path confinement** | Path arguments inside `$HOME` but outside `workDir` are blocked (see [Path Containment](security/guardrails.md#path-containment)) |
| 6 | **Timeout** | Configurable per-command timeout (default: 120s) |
| 7 | **No shell** | Uses `exec.CommandContext` directly — no shell expansion |
| 8 | **Working directory** | `cmd.Dir` set to `workDir` so relative paths resolve within the agent directory |
| 9 | **Environment isolation** | Only `PATH`, `HOME`, `LANG`, explicit passthrough vars, proxy vars, and `OPENAI_ORG_ID` (when set). `HOME` is overridden to `workDir` to prevent `~` expansion from reaching the real home directory |
| 10 | **Output limits** | Configurable max output size (default: 1MB) to prevent memory exhaustion |

## File Create

Expand Down
3 changes: 3 additions & 0 deletions forge-cli/build/policy_stage.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ func (s *PolicyStage) Execute(ctx context.Context, bc *pipeline.BuildContext) er
Type: "content_filter",
Config: map[string]any{"enabled": true},
},
{Type: "no_pii"},
{Type: "jailbreak_protection"},
{Type: "no_secrets"},
},
}
}
Expand Down
11 changes: 9 additions & 2 deletions forge-cli/cmd/run.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ var (
runShutdownTimeout time.Duration
runMockTools bool
runEnforceGuardrails bool
runNoGuardrails bool
runModel string
runProvider string
runEnvFile string
Expand All @@ -42,7 +43,8 @@ func init() {
runCmd.Flags().StringVar(&runHost, "host", "", "bind address (e.g. 0.0.0.0 for containers)")
runCmd.Flags().DurationVar(&runShutdownTimeout, "shutdown-timeout", 0, "graceful shutdown timeout (e.g. 30s)")
runCmd.Flags().BoolVar(&runMockTools, "mock-tools", false, "use mock runtime instead of subprocess")
runCmd.Flags().BoolVar(&runEnforceGuardrails, "enforce-guardrails", false, "enforce guardrail violations as errors")
runCmd.Flags().BoolVar(&runEnforceGuardrails, "enforce-guardrails", true, "enforce guardrail violations as errors")
runCmd.Flags().BoolVar(&runNoGuardrails, "no-guardrails", false, "disable all guardrail enforcement")
runCmd.Flags().StringVar(&runModel, "model", "", "override model name (sets MODEL_NAME env var)")
runCmd.Flags().StringVar(&runProvider, "provider", "", "LLM provider (openai, anthropic, ollama)")
runCmd.Flags().StringVar(&runEnvFile, "env", ".env", "path to .env file")
Expand All @@ -59,14 +61,19 @@ func runRun(cmd *cobra.Command, args []string) error {

activeChannels := parseChannels(runWithChannels)

enforceGuardrails := runEnforceGuardrails
if runNoGuardrails {
enforceGuardrails = false
}

runner, err := runtime.NewRunner(runtime.RunnerConfig{
Config: cfg,
WorkDir: workDir,
Port: runPort,
Host: runHost,
ShutdownTimeout: runShutdownTimeout,
MockTools: runMockTools,
EnforceGuardrails: runEnforceGuardrails,
EnforceGuardrails: enforceGuardrails,
ModelOverride: runModel,
ProviderOverride: runProvider,
EnvFilePath: resolveEnvPath(workDir, runEnvFile),
Expand Down
7 changes: 5 additions & 2 deletions forge-cli/cmd/run_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,11 @@ func TestRunCmd_FlagDefaults(t *testing.T) {
if runMockTools {
t.Error("mock-tools should default to false")
}
if runEnforceGuardrails {
t.Error("enforce-guardrails should default to false")
if !runEnforceGuardrails {
t.Error("enforce-guardrails should default to true")
}
if runNoGuardrails {
t.Error("no-guardrails should default to false")
}
if runModel != "" {
t.Errorf("model should default to empty, got %q", runModel)
Expand Down
8 changes: 6 additions & 2 deletions forge-cli/cmd/serve.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ var (
serveHost string
serveShutdownTimeout time.Duration
serveEnforceGuardrails bool
serveNoGuardrails bool
serveModel string
serveProvider string
serveEnvFile string
Expand Down Expand Up @@ -87,7 +88,8 @@ func registerServeFlags(cmd *cobra.Command) {
cmd.Flags().IntVarP(&servePort, "port", "p", 8080, "HTTP server port")
cmd.Flags().StringVar(&serveHost, "host", "127.0.0.1", "bind address (use 0.0.0.0 for containers)")
cmd.Flags().DurationVar(&serveShutdownTimeout, "shutdown-timeout", 30*time.Second, "graceful shutdown timeout")
cmd.Flags().BoolVar(&serveEnforceGuardrails, "enforce-guardrails", false, "enforce guardrail violations as errors")
cmd.Flags().BoolVar(&serveEnforceGuardrails, "enforce-guardrails", true, "enforce guardrail violations as errors")
cmd.Flags().BoolVar(&serveNoGuardrails, "no-guardrails", false, "disable all guardrail enforcement")
cmd.Flags().StringVar(&serveModel, "model", "", "override model name (sets MODEL_NAME env var)")
cmd.Flags().StringVar(&serveProvider, "provider", "", "LLM provider (openai, anthropic, ollama)")
cmd.Flags().StringVar(&serveEnvFile, "env", ".env", "path to .env file")
Expand Down Expand Up @@ -166,7 +168,9 @@ func serveStartRun(cmd *cobra.Command, args []string) error {
"--host", serveHost,
"--shutdown-timeout", serveShutdownTimeout.String(),
}
if serveEnforceGuardrails {
if serveNoGuardrails {
runArgs = append(runArgs, "--no-guardrails")
} else if serveEnforceGuardrails {
runArgs = append(runArgs, "--enforce-guardrails")
}
if serveModel != "" {
Expand Down
16 changes: 16 additions & 0 deletions forge-cli/runtime/guardrails_loader.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,19 @@ func LoadPolicyScaffold(workDir string) (*agentspec.PolicyScaffold, error) {
}
return &ps, nil
}

// DefaultPolicyScaffold returns a scaffold with all built-in guardrails enabled.
// Used when no policy-scaffold.json exists (e.g. running without forge build).
func DefaultPolicyScaffold() *agentspec.PolicyScaffold {
return &agentspec.PolicyScaffold{
Guardrails: []agentspec.Guardrail{
{
Type: "content_filter",
Config: map[string]any{"enabled": true},
},
{Type: "no_pii"},
{Type: "jailbreak_protection"},
{Type: "no_secrets"},
},
}
}
Loading
Loading