Skip to content

feat: v2.4.0 Headless CLI Testing milestone#21

Merged
RichardHightower merged 24 commits intomainfrom
feature/phase-34-codex-cli-tests
Mar 6, 2026
Merged

feat: v2.4.0 Headless CLI Testing milestone#21
RichardHightower merged 24 commits intomainfrom
feature/phase-34-codex-cli-tests

Conversation

@RichardHightower
Copy link
Contributor

Summary

  • Codex CLI adapter with 5 skills (no hooks by design), sandbox workaround docs
  • 26 Codex bats tests — 8 smoke, 6 skipped hooks, 5 pipeline E2E, 7 negative
  • Cross-CLI matrix report script (scripts/cli-matrix-report.sh) parsing JUnit XML from all 5 CLIs
  • CI matrix-report job in e2e-cli.yml with GitHub step summary output
  • Milestone completion — archives, PROJECT.md evolution, version bump to 2.4.0

Milestone Stats

  • 5 phases (30-34), 15 plans, 144 bats tests across 5 CLIs
  • Adapters: Claude Code, Gemini, OpenCode, Copilot, Codex
  • Test categories: smoke, hooks, pipeline, negative per CLI

Test plan

  • bats tests/cli/codex/ — all 26 Codex tests pass (or skip gracefully)
  • bats tests/cli/claude-code/ — existing 30 tests unaffected
  • scripts/cli-matrix-report.sh /tmp/empty — produces header with no data (graceful)
  • CI e2e-cli matrix runs all 5 CLIs
  • cargo check --workspace passes at v2.4.0

🤖 Generated with Claude Code

RichardHightower and others added 23 commits March 5, 2026 14:21
…apper

- 6 compact single-line Copilot-native fixtures (ms timestamps, no hook_event_name/session_id)
- run_copilot wrapper in cli_wrappers.bash with timeout guard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- smoke.bats: 8 tests (binary checks, daemon health, ingest, copilot CLI skip)
- hooks.bats: 10 tests (all 5 event types, session synthesis, Bug #991, cleanup)
- Fix jq -n to jq -nc in memory-capture.sh (multi-line JSON broke memory-ingest read_line)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md with 18 tests across 2 bats files
- STATE.md updated with position and decisions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 5 tests covering full session lifecycle, TOC browse, cwd metadata, agent field preservation, concurrent session isolation
- Uses direct CchEvent format with agent=copilot for deterministic testing
- Mirrors gemini/pipeline.bats pattern with 5-event Copilot session helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dling tests

- 7 tests covering memory-ingest and memory-capture.sh fail-open behavior
- memory-ingest tests: daemon down, malformed JSON, empty stdin, unknown event type
- memory-capture.sh tests: daemon down, malformed input, empty stdin (assert exit 0, no stdout)
- All 30 Copilot tests pass across 4 test files (smoke, hooks, pipeline, negative)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md documents 2 tasks, 2 files, 30 total Copilot tests
- STATE.md updated with position, decisions, metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Python3-based JUnit XML parser for all 5 CLIs
- Produces markdown table with CLI x scenario pass/fail/skip
- Supports both local and CI artifact directory structures
- Handles missing/empty XML gracefully

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New job runs after all CLI matrix entries complete
- Downloads JUnit artifacts and generates cross-CLI summary
- Report output goes to GitHub Actions step summary
- Uses if: always() to run even when some CLIs fail

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md with execution results
- STATE.md updated to phase 34 complete (100%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add adapters/codex-cli/ with .codex/skills/ for memory-query, retrieval-policy, topic-graph, bm25-search, vector-search
- Each skill has YAML frontmatter (name + description) and references/command-reference.md
- Add SANDBOX-WORKAROUND.md documenting macOS Seatbelt and Linux Landlock issues
- Add README.md explaining no-hooks limitation (Discussion #2150)
- No hooks directory -- Codex CLI does not support lifecycle hooks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ests

- Create 6 CchEvent fixtures in tests/cli/fixtures/codex/ with agent:"codex"
- Add run_codex() wrapper to cli_wrappers.bash using codex exec --full-auto --json
- Create smoke.bats with 8 tests (6 always-run + 2 codex-binary-dependent)
- Create hooks.bats with 6 all-skipped tests annotating no-hooks limitation
- Test 6 verifies adapter skills exist with valid YAML frontmatter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md with 2 tasks, 22 files created, all verifications passed
- STATE.md updated: plan 1/3 complete, decisions, metrics

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Session lifecycle, TOC browse, cwd metadata, agent field, concurrent isolation
- Direct CchEvent format with agent=codex (no hooks)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…hook tests

- memory-ingest fail-open: daemon-down, malformed, empty, unknown event
- Hook tests skipped with GitHub Discussion #2150 annotation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md with 2 task commits documented
- STATE.md updated: plan 2/3, progress, decisions, session

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Archive milestone artifacts, evolve PROJECT.md, collapse ROADMAP.md.

5 phases (30-34), 15 plans, 144 bats tests across 5 CLIs.
Key: bats-core E2E harness, Codex adapter, cross-CLI matrix report.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bump workspace version from 2.3.0 to 2.4.0 for Headless CLI Testing milestone.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update macos-x86_64 runner from macos-13 (deprecated) to macos-15
- Add Cross.toml for aarch64 cross-compilation with OpenSSL
- Make release job run with if: always() to handle partial build failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant