feat: v2.4 Phase 30 — Claude Code CLI Harness by RichardHightower · Pull Request #18 · SpillwaveSolutions/agent-memory

RichardHightower · 2026-02-25T18:44:34Z

Summary

Build bats-core E2E test framework for headless CLI testing across 5 AI coding CLIs
30 bats tests: smoke (8), hooks (10), pipeline (5), negative (7) — all passing
Shared test helpers (common.bash, cli_wrappers.bash) with workspace isolation, daemon lifecycle, random port selection
10 Claude Code event fixture JSON files for all hook event types
CI workflow (e2e-cli.yml) with 5-CLI x 2-OS matrix (claude-code, gemini, opencode, copilot, codex)
Added MEMORY_DAEMON_ADDR env var support to memory-ingest for test port isolation
Fixed IPv6/IPv4 mismatch (daemon binds 0.0.0.0, client used [::1])
Fixed clap short flag conflict (-l for --log-level vs --limit)

Key fixes in final commits

hooks.bats silent ingest failure: memory-ingest reads stdin line-by-line, so multi-line JSON from jq pretty-print silently failed. Fixed with jq -c (compact output)
Layer 2 assertion mismatch: Query output format is [timestamp] agent_type: content — session_id is never shown. Replaced with count/content assertions

Test plan

bats tests/cli/claude-code/ — 30/30 pass (1 expected skip for nested Claude)
cargo clippy --workspace — zero warnings
cargo test --workspace — all pass
CI matrix run on push

🤖 Generated with Claude Code

…ure, pitfalls

- 9 valid fixtures: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, AssistantResponse, SubagentStart, SubagentStop, Stop, SessionEnd - 1 malformed fixture for negative testing - All fixtures match CchEvent struct in memory-ingest Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 5-CLI matrix (claude-code, gemini, opencode, copilot, codex) x 2-OS - Bats-core installation with bats-support and bats-assert libraries - JUnit XML report generation via --report-formatter junit - Failure artifact upload with 7-day retention - Skip annotation when CLI test directory not found Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Workspace isolation with setup_workspace/teardown_workspace - Daemon lifecycle: build_daemon_if_needed, start_daemon, stop_daemon - Health checking with grpcurl, nc, and /dev/tcp fallbacks - Random port selection for test isolation - grpc_query and ingest_event helpers - Auto-detect PROJECT_ROOT via workspace Cargo.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- CLI detection: require_cli (skip) and has_cli (boolean) - Claude Code wrappers: run_claude, run_claude_with_hooks with timeout - Hook pipeline testing: run_hook_stdin, run_hook_stdin_dry - Cross-platform timeout detection (timeout/gtimeout) - .gitignore excludes .runs/ workspace artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- SUMMARY.md with self-check passed - STATE.md updated: plan 02 complete, progress 50% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ingest - 6 always-run tests: daemon binary, ingest binary, health check, valid/malformed/empty JSON ingest - 2 claude-dependent tests: binary detection and headless JSON output (skip if absent) - Fix IPv6->IPv4 mismatch: daemon binds 0.0.0.0 but helpers used [::1], switched to 127.0.0.1 - Fix health check: daemon lacks grpc.health service, prefer nc TCP check over grpcurl - Add nested-session skip for CLAUDECODE env var Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…fication - 9 individual event type tests: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, AssistantResponse, SubagentStart, SubagentStop, Stop, SessionEnd - 1 multi-event sequence coherence test (4 events, same session) - Two-layer proof: Layer 1 (ingest exit 0 + continue:true), Layer 2 (gRPC query) - Unique session IDs per test via PID to avoid cross-test interference - Build-resilience: fallback to existing binary when cargo build fails - Fixture JSON rewritten with jq (sed fallback) for session_id isolation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tion - 5 tests covering complete session lifecycle, TOC browse, cwd metadata, real Claude Code hook fire, and concurrent session isolation - Fix: DEFAULT_ENDPOINT changed from [::1] to 127.0.0.1 to match daemon bind addr - Fix: Remove short flag from global --log-level to avoid clap conflict with --limit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 7 tests: daemon down, malformed JSON, empty stdin, unknown event type, timeout enforcement, wrong port, and large payload handling - All tests verify fail-open behavior (exit 0, continue:true) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- SUMMARY.md with 2 task commits, 3 deviation fixes documented - STATE.md updated: phase 30 plan 4/4 complete, progress 68% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Read MEMORY_DAEMON_ADDR from environment before gRPC connect - Use MemoryClient::connect(addr) when env var is set - Fall back to connect_default() (port 50051) when unset - Preserves fail-open behavior throughout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove PIPELINE_PORT=50051 variable - Call start_daemon with no args (uses random port via pick_random_port) - Remove outdated NOTE about hardcoded DEFAULT_ENDPOINT - Update header comment to reflect OS-assigned random port isolation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- SUMMARY.md documents env var support and random port isolation - STATE.md updated with plan position, decisions, metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… in hooks.bats - All 10 hook capture tests now use hard Layer 2 assertions - Tests fail properly when session_id not found in gRPC query results - Content assertions added for tests 2-5 (project structure, Read tool) - Removed if-guard around result check -- empty result now fails as expected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ase 30 - Mark plans 30-05 and 30-06 as complete in plan list - Update Phase 30 progress from 0/4 to 6/6 in progress table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…g.toml

…tions Root cause: memory-ingest reads stdin line-by-line, so multi-line JSON from jq pretty-print silently failed (fail-open returns continue:true). Also, query output format never includes session_id. Changes: - Use jq -c (compact) in rewrite_session_id to ensure single-line JSON - Replace session_id assertions with event count and content assertions - Use wide time window (from=0) to avoid timestamp precision issues - All 30/30 bats tests now pass (hooks 10, smoke 8, pipeline 5, negative 7) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RichardHightower and others added 30 commits February 22, 2026 19:22

docs: start milestone v2.4 Headless CLI Testing

8f565b2

docs: v2.4 research — headless CLI testing stack, features, architect…

6d137b9

…ure, pitfalls

docs: define milestone v2.4 requirements

c3f9f82

docs: create milestone v2.4 roadmap (5 phases)

60ba2b3

docs: enable research workflow for v2.4

913f2ad

docs(30): capture phase context

2111d9c

docs(30): create phase plan for Claude Code CLI Harness

9b5c6f0

docs(30-02): complete fixtures and CI workflow plan

8722d39

- SUMMARY.md with self-check passed - STATE.md updated: plan 02 complete, progress 50% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(30-01): complete shared test helper library plan

16d856f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(30-03): complete smoke tests and hook capture tests plan

432ecc5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(30-04): complete pipeline and negative tests plan

235802a

- SUMMARY.md with 2 task commits, 3 deviation fixes documented - STATE.md updated: phase 30 plan 4/4 complete, progress 68% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(30): create gap closure plans 30-05 and 30-06

314595c

docs(30): update ROADMAP with gap closure plans and fix common.bash path

1b2a0b0

docs(30-05): complete MEMORY_DAEMON_ADDR gap closure plan

dbccaaa

- SUMMARY.md documents env var support and random port isolation - STATE.md updated with plan position, decisions, metrics Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(30-06): update ROADMAP plan checkboxes and progress table for Ph…

e22217f

…ase 30 - Mark plans 30-05 and 30-06 as complete in plan list - Update Phase 30 progress from 0/4 to 6/6 in progress table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(30-06): complete hooks.bats hard assertions + ROADMAP fix plan

ce4da17

docs(phase-30): complete phase execution and verification

31d36e6

docs: capture todo - Fix macOS 26 C++ compile issue with .cargo/confi…

7c6a4e0

…g.toml

fix: add .cargo/config.toml workaround for macOS 26 C++ header issue

d74fc70

RichardHightower temporarily deployed to e2e-cli February 25, 2026 18:44 — with GitHub Actions Inactive

RichardHightower merged commit da55dfd into main Feb 26, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: v2.4 Phase 30 — Claude Code CLI Harness#18

feat: v2.4 Phase 30 — Claude Code CLI Harness#18
RichardHightower merged 31 commits intomainfrom
feature/v2.4-headless-cli-testing

RichardHightower commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RichardHightower commented Feb 25, 2026

Summary

Key fixes in final commits

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant