Skip to content

feat: v2.4 Phase 30 — Claude Code CLI Harness#18

Merged
RichardHightower merged 31 commits intomainfrom
feature/v2.4-headless-cli-testing
Feb 26, 2026
Merged

feat: v2.4 Phase 30 — Claude Code CLI Harness#18
RichardHightower merged 31 commits intomainfrom
feature/v2.4-headless-cli-testing

Conversation

@RichardHightower
Copy link
Contributor

Summary

  • Build bats-core E2E test framework for headless CLI testing across 5 AI coding CLIs
  • 30 bats tests: smoke (8), hooks (10), pipeline (5), negative (7) — all passing
  • Shared test helpers (common.bash, cli_wrappers.bash) with workspace isolation, daemon lifecycle, random port selection
  • 10 Claude Code event fixture JSON files for all hook event types
  • CI workflow (e2e-cli.yml) with 5-CLI x 2-OS matrix (claude-code, gemini, opencode, copilot, codex)
  • Added MEMORY_DAEMON_ADDR env var support to memory-ingest for test port isolation
  • Fixed IPv6/IPv4 mismatch (daemon binds 0.0.0.0, client used [::1])
  • Fixed clap short flag conflict (-l for --log-level vs --limit)

Key fixes in final commits

  • hooks.bats silent ingest failure: memory-ingest reads stdin line-by-line, so multi-line JSON from jq pretty-print silently failed. Fixed with jq -c (compact output)
  • Layer 2 assertion mismatch: Query output format is [timestamp] agent_type: content — session_id is never shown. Replaced with count/content assertions

Test plan

  • bats tests/cli/claude-code/ — 30/30 pass (1 expected skip for nested Claude)
  • cargo clippy --workspace — zero warnings
  • cargo test --workspace — all pass
  • CI matrix run on push

🤖 Generated with Claude Code

RichardHightower and others added 30 commits February 22, 2026 19:22
- 9 valid fixtures: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, AssistantResponse, SubagentStart, SubagentStop, Stop, SessionEnd
- 1 malformed fixture for negative testing
- All fixtures match CchEvent struct in memory-ingest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 5-CLI matrix (claude-code, gemini, opencode, copilot, codex) x 2-OS
- Bats-core installation with bats-support and bats-assert libraries
- JUnit XML report generation via --report-formatter junit
- Failure artifact upload with 7-day retention
- Skip annotation when CLI test directory not found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Workspace isolation with setup_workspace/teardown_workspace
- Daemon lifecycle: build_daemon_if_needed, start_daemon, stop_daemon
- Health checking with grpcurl, nc, and /dev/tcp fallbacks
- Random port selection for test isolation
- grpc_query and ingest_event helpers
- Auto-detect PROJECT_ROOT via workspace Cargo.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CLI detection: require_cli (skip) and has_cli (boolean)
- Claude Code wrappers: run_claude, run_claude_with_hooks with timeout
- Hook pipeline testing: run_hook_stdin, run_hook_stdin_dry
- Cross-platform timeout detection (timeout/gtimeout)
- .gitignore excludes .runs/ workspace artifacts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md with self-check passed
- STATE.md updated: plan 02 complete, progress 50%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ingest

- 6 always-run tests: daemon binary, ingest binary, health check, valid/malformed/empty JSON ingest
- 2 claude-dependent tests: binary detection and headless JSON output (skip if absent)
- Fix IPv6->IPv4 mismatch: daemon binds 0.0.0.0 but helpers used [::1], switched to 127.0.0.1
- Fix health check: daemon lacks grpc.health service, prefer nc TCP check over grpcurl
- Add nested-session skip for CLAUDECODE env var

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…fication

- 9 individual event type tests: SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, AssistantResponse, SubagentStart, SubagentStop, Stop, SessionEnd
- 1 multi-event sequence coherence test (4 events, same session)
- Two-layer proof: Layer 1 (ingest exit 0 + continue:true), Layer 2 (gRPC query)
- Unique session IDs per test via PID to avoid cross-test interference
- Build-resilience: fallback to existing binary when cargo build fails
- Fixture JSON rewritten with jq (sed fallback) for session_id isolation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion

- 5 tests covering complete session lifecycle, TOC browse, cwd metadata,
  real Claude Code hook fire, and concurrent session isolation
- Fix: DEFAULT_ENDPOINT changed from [::1] to 127.0.0.1 to match daemon bind addr
- Fix: Remove short flag from global --log-level to avoid clap conflict with --limit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 7 tests: daemon down, malformed JSON, empty stdin, unknown event type,
  timeout enforcement, wrong port, and large payload handling
- All tests verify fail-open behavior (exit 0, continue:true)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md with 2 task commits, 3 deviation fixes documented
- STATE.md updated: phase 30 plan 4/4 complete, progress 68%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Read MEMORY_DAEMON_ADDR from environment before gRPC connect
- Use MemoryClient::connect(addr) when env var is set
- Fall back to connect_default() (port 50051) when unset
- Preserves fail-open behavior throughout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove PIPELINE_PORT=50051 variable
- Call start_daemon with no args (uses random port via pick_random_port)
- Remove outdated NOTE about hardcoded DEFAULT_ENDPOINT
- Update header comment to reflect OS-assigned random port isolation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md documents env var support and random port isolation
- STATE.md updated with plan position, decisions, metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… in hooks.bats

- All 10 hook capture tests now use hard Layer 2 assertions
- Tests fail properly when session_id not found in gRPC query results
- Content assertions added for tests 2-5 (project structure, Read tool)
- Removed if-guard around result check -- empty result now fails as expected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ase 30

- Mark plans 30-05 and 30-06 as complete in plan list
- Update Phase 30 progress from 0/4 to 6/6 in progress table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

Root cause: memory-ingest reads stdin line-by-line, so multi-line JSON
from jq pretty-print silently failed (fail-open returns continue:true).
Also, query output format never includes session_id.

Changes:
- Use jq -c (compact) in rewrite_session_id to ensure single-line JSON
- Replace session_id assertions with event count and content assertions
- Use wide time window (from=0) to avoid timestamp precision issues
- All 30/30 bats tests now pass (hooks 10, smoke 8, pipeline 5, negative 7)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant