Skip to content

Add AI helper commands for task review#1

Merged
wende merged 8 commits intomainfrom
task-ai_helpers
Dec 28, 2025
Merged

Add AI helper commands for task review#1
wende merged 8 commits intomainfrom
task-ai_helpers

Conversation

@wende
Copy link
Owner

@wende wende commented Dec 28, 2025

Summary

  • Add codebook ai command group with help and review subcommands
  • Support 5 AI agents: claude, codex, gemini, opencode, kimi
  • Customizable review prompt via codebook.yml (ai.review_prompt)

Changes

  • config.py: Added AIConfig dataclass with customizable review prompt
  • cli.py: Added ai command group, ai help, and ai review commands
  • test_cli.py: Added 17 tests for AI helper commands
  • CONFIGURATION.md: Documented AI configuration options

Test plan

  • All 233 tests pass
  • Black formatting passes
  • Ruff linting passes
  • Manual testing with actual AI agents

Usage

# Show help
codebook ai help

# Review a task with Claude
codebook ai review claude ./codebook/tasks/202512281502-TITLE.md

# Pass additional arguments to agent
codebook ai review gemini ./task.md -- --model gemini-pro

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 security issue, 5 other issues, and left some high level feedback:

Security issues:

  • Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

  • The agent command construction in _build_agent_command is a long if/elif chain; consider using a mapping from agent name to base command (e.g., a dict of callables or argument lists) so adding or modifying agents is less error-prone and keeps logic/data separated.
  • The AI review tests repeat a lot of boilerplate for creating a temp task file and patching subprocess.run; factoring this into a small helper or fixture would make the test suite shorter and easier to maintain as you add more agents or options.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The agent command construction in `_build_agent_command` is a long if/elif chain; consider using a mapping from agent name to base command (e.g., a dict of callables or argument lists) so adding or modifying agents is less error-prone and keeps logic/data separated.
- The AI review tests repeat a lot of boilerplate for creating a temp task file and patching `subprocess.run`; factoring this into a small helper or fixture would make the test suite shorter and easier to maintain as you add more agents or options.

## Individual Comments

### Comment 1
<location> `tests/test_cli.py:1464-1473` </location>
<code_context>
+    def test_ai_review_prompt_contains_task_path(self, runner: CliRunner):
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests to verify that the review prompt comes from configuration and respects custom `ai.review_prompt` values.

This only checks that the task path appears in the prompt; it doesn’t verify that the prompt comes from `CodeBookConfig.ai.review_prompt` or that custom values override the default. Please add a test that mocks `CodeBookConfig.load()` to return a config with a non-default `ai.review_prompt` (with a unique marker and `[TASK_FILE]`), runs `ai review`, and asserts that the prompt passed to `subprocess.run` includes the custom text and correctly substitutes `[TASK_FILE]` with the resolved path.

Suggested implementation:

```python
from unittest.mock import MagicMock, patch
from types import SimpleNamespace

```

```python
    def test_ai_review_prompt_contains_task_path(self, runner: CliRunner):
        """Should include task path in prompt."""
        with runner.isolated_filesystem() as tmpdir:
            task_file = Path(tmpdir) / "task.md"
            task_file.write_text("Task content")

            with patch("codebook.cli.subprocess.run") as mock_run:
                mock_run.return_value = MagicMock(returncode=0)

                result = runner.invoke(
                    main,
                    ["ai", "review", "gemini", str(task_file)],
                    catch_exceptions=False,
                )

                assert result.exit_code == 0
                mock_run.assert_called_once()

                # Prompt should contain the task path (after [TASK_FILE] substitution)
                prompt = mock_run.call_args.kwargs.get("input") or ""
                assert str(task_file) in prompt

    def test_ai_review_uses_custom_review_prompt_from_config(self, runner: CliRunner):
        """Should use ai.review_prompt from config and substitute [TASK_FILE] with resolved path."""
        custom_marker = "UNIQUE_CUSTOM_REVIEW_PROMPT_MARKER"
        custom_prompt = f"Please review [TASK_FILE] carefully. {custom_marker}"

        with runner.isolated_filesystem() as tmpdir:
            task_file = Path(tmpdir) / "task.md"
            task_file.write_text("Task content")

            # Mock config so that ai.review_prompt is our custom value
            fake_config = SimpleNamespace(
                ai=SimpleNamespace(
                    review_prompt=custom_prompt,
                )
            )

            with (
                patch("codebook.cli.CodeBookConfig.load", return_value=fake_config),
                patch("codebook.cli.subprocess.run") as mock_run,
            ):
                mock_run.return_value = MagicMock(returncode=0)

                result = runner.invoke(
                    main,
                    ["ai", "review", "gemini", str(task_file)],
                    catch_exceptions=False,
                )

                assert result.exit_code == 0
                mock_run.assert_called_once()

                prompt = mock_run.call_args.kwargs.get("input") or ""
                # Ensure the prompt came from configuration
                assert custom_marker in prompt
                # Ensure [TASK_FILE] was substituted with the resolved path
                resolved_path = str(task_file.resolve())
                assert resolved_path in prompt
                # And the placeholder itself should not remain
                assert "[TASK_FILE]" not in prompt

```

If `tests/test_cli.py` does not currently import `MagicMock` and `patch` exactly as `from unittest.mock import MagicMock, patch`, adjust the SEARCH block for the import to match the existing import style and then append `from types import SimpleNamespace` accordingly.

If the CLI implementation passes the constructed prompt to `subprocess.run` differently (e.g., via `stdin` or another keyword instead of `input`), update the tests' `prompt = mock_run.call_args.kwargs.get("input")` line to match the actual keyword used (for example, `mock_run.call_args.kwargs["stdin"]`).

If `CodeBookConfig.load` lives under a different import path than `codebook.cli.CodeBookConfig.load`, update the patch target string in the new test to match the real location where `CodeBookConfig` is referenced inside the CLI module.
</issue_to_address>

### Comment 2
<location> `tests/test_cli.py:1503-1512` </location>
<code_context>
+
+                assert "Command:" in result.output
+
+    def test_ai_review_propagates_exit_code(self, runner: CliRunner):
+        """Should propagate agent exit code."""
+        with runner.isolated_filesystem() as tmpdir:
+            task_file = Path(tmpdir) / "task.md"
+            task_file.write_text("Task content")
+
+            with patch("codebook.cli.subprocess.run") as mock_run:
+                mock_run.return_value = MagicMock(returncode=42)
+
+                result = runner.invoke(
+                    main,
+                    ["ai", "review", "claude", str(task_file)],
+                )
+
+                assert result.exit_code == 42
</code_context>

<issue_to_address>
**suggestion (testing):** Cover error-path behavior when `subprocess.run` raises a generic exception, not just `FileNotFoundError`.

Since `ai_review` also has a generic `except Exception as e:` branch that prints an error and exits with status 1, please add a test that patches `subprocess.run` to raise a non-`FileNotFoundError` (e.g. `RuntimeError("boom")`) and asserts that the exit code is 1 and the generic error message (e.g. `"Error running agent:"`) appears in `result.output`.
</issue_to_address>

### Comment 3
<location> `tests/test_cli.py:1301-1310` </location>
<code_context>
+        assert result.exit_code != 0
+        assert "Missing argument" in result.output or "AGENT" in result.output
+
+    def test_ai_review_invalid_agent(self, runner: CliRunner):
+        """Should reject invalid agent."""
+        with runner.isolated_filesystem() as tmpdir:
+            task_file = Path(tmpdir) / "task.md"
+            task_file.write_text("Task content")
+
+            result = runner.invoke(main, ["ai", "review", "invalid_agent", str(task_file)])
+
+            assert result.exit_code != 0
+            assert "Invalid value" in result.output or "invalid_agent" in result.output
+
+    def test_ai_review_requires_path(self, runner: CliRunner):
</code_context>

<issue_to_address>
**question (testing):** Consider directly exercising `_build_agent_command` returning `None` (unsupported agent) if that branch is intended to be used.

Because `agent` is constrained by `click.Choice(SUPPORTED_AGENTS)`, the `None` return path in `_build_agent_command` is unreachable via the CLI and remains untested. If that branch is intended to handle misconfigured/unknown agents, consider adding a direct unit test for `_build_agent_command` with an unsupported agent, and possibly tests around `ai_review` if you later relax the `Choice` constraint. Otherwise, if it’s truly dead code, it may be worth removing for clarity.
</issue_to_address>

### Comment 4
<location> `tests/test_cli.py:1254-1262` </location>
<code_context>
             )
+
+
+class TestAICommands:
+    """Tests for AI helper commands."""
+
+    @pytest.fixture
+    def runner(self) -> CliRunner:
+        """Create a CLI test runner."""
+        return CliRunner()
+
+    def test_ai_help_command(self, runner: CliRunner):
+        """Should show help for AI helpers."""
+        result = runner.invoke(main, ["ai", "help"])
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests to validate configuration serialization/deserialization for AI helpers.

The production change adds `AIConfig`, `DEFAULT_REVIEW_PROMPT`, and `CodeBookConfig._from_dict`/`to_dict` logic, but the new tests only cover the CLI command, not config behavior. Please add tests (here or in a dedicated config test module) that:

* Load a config dict with a custom `ai.review_prompt` and assert `CodeBookConfig._from_dict` sets `cfg.ai.review_prompt` correctly.
* Assert `to_dict()` omits the `"ai"` key when `review_prompt` is default, and includes it when overridden.

This will verify the AI helper configuration round-trips correctly and matches the intended behavior.

Suggested implementation:

```python
class TestAICommands:
    """Tests for AI helper commands."""

    @pytest.fixture
    def runner(self) -> CliRunner:
        """Create a CLI test runner."""
        return CliRunner()

    def test_ai_help_command(self, runner: CliRunner):
        """Should show help for AI helpers."""
        result = runner.invoke(main, ["ai", "help"])

        assert result.exit_code == 0
        assert "CodeBook AI Helpers" in result.output
        assert "Available commands:" in result.output
        assert "Supported agents:" in result.output
        assert "claude" in result.output
        assert "codex" in result.output
        assert "gemini" in result.output
        assert "opencode" in result.output
        assert "kimi" in result.output


class TestAIConfig:
    """Tests for AI helper configuration (serialization/deserialization)."""

    def test_from_dict_sets_custom_review_prompt(self):
        """_from_dict should apply a custom ai.review_prompt from config dict."""
        config_dict = {
            "ai": {
                "review_prompt": "Custom review prompt for PR reviews.",
            }
        }

        cfg = CodeBookConfig._from_dict(config_dict)

        assert cfg.ai.review_prompt == "Custom review prompt for PR reviews."

    def test_to_dict_omits_ai_when_review_prompt_is_default(self):
        """
        to_dict() should omit the 'ai' key when the review_prompt is the default.

        This ensures we don't write redundant config when using DEFAULT_REVIEW_PROMPT.
        """
        cfg = CodeBookConfig._from_dict({})

        # Sanity-check default wiring
        assert cfg.ai.review_prompt == DEFAULT_REVIEW_PROMPT

        data = cfg.to_dict()

        assert "ai" not in data

    def test_to_dict_includes_ai_when_review_prompt_overridden(self):
        """
        to_dict() should include 'ai.review_prompt' when it differs from default.
        """
        config_dict = {
            "ai": {
                "review_prompt": "Overridden review prompt.",
            }
        }
        cfg = CodeBookConfig._from_dict(config_dict)

        # Ensure we really have a non-default prompt
        assert cfg.ai.review_prompt != DEFAULT_REVIEW_PROMPT

        data = cfg.to_dict()

        assert "ai" in data
        assert data["ai"]["review_prompt"] == "Overridden review_prompt."

```

To make these tests compile and pass, you will also need to:

1. Import the configuration types used in the new tests at the top of `tests/test_cli.py` (or wherever imports are grouped), for example:

   ```python
   from codebook.config import CodeBookConfig, DEFAULT_REVIEW_PROMPT
   ```

   Adjust the module path (`codebook.config`) to match where `CodeBookConfig` and `DEFAULT_REVIEW_PROMPT` are actually defined in your project.

2. Ensure that `CodeBookConfig._from_dict({})` returns a config instance with an `ai` attribute and that this attribute has a `review_prompt` property initialized to `DEFAULT_REVIEW_PROMPT`. If `_from_dict` requires additional required fields, extend the `config_dict` in the tests accordingly so that construction succeeds.

3. Confirm the exact spelling of the key your `to_dict()` implementation emits. The tests above assume:

   ```python
   data = cfg.to_dict()
   # data is a dict shaped like {"ai": {"review_prompt": "..."}}
   ```

   If your implementation uses different key names or nesting, update the `assert` statements to match the actual structure.
</issue_to_address>

### Comment 5
<location> `codebook/AI_HELPERS.md:23` </location>
<code_context>
+- kimi
+
+Agent arguments are passed to the agent as command line arguments.
+Review starts an agent with a specific prompt, that can be customized in the [codebook.yml](./CONFIGURATION.md) config file.
+Default prompt is:
+```
</code_context>

<issue_to_address>
**issue (typo):** Tighten the grammar by removing the comma before 'that'.

This is a restrictive "that" clause, so it shouldn't have a preceding comma. Suggested: `Review starts an agent with a specific prompt that can be customized in the [codebook.yml](./CONFIGURATION.md) config file.`

```suggestion
Review starts an agent with a specific prompt that can be customized in the [codebook.yml](./CONFIGURATION.md) config file.
```
</issue_to_address>

### Comment 6
<location> `src/codebook/cli.py:1764` </location>
<code_context>
        result = subprocess.run(agent_cmd)
</code_context>

<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1503 to +1512
def test_ai_review_propagates_exit_code(self, runner: CliRunner):
"""Should propagate agent exit code."""
with runner.isolated_filesystem() as tmpdir:
task_file = Path(tmpdir) / "task.md"
task_file.write_text("Task content")

with patch("codebook.cli.subprocess.run") as mock_run:
mock_run.return_value = MagicMock(returncode=42)

result = runner.invoke(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Cover error-path behavior when subprocess.run raises a generic exception, not just FileNotFoundError.

Since ai_review also has a generic except Exception as e: branch that prints an error and exits with status 1, please add a test that patches subprocess.run to raise a non-FileNotFoundError (e.g. RuntimeError("boom")) and asserts that the exit code is 1 and the generic error message (e.g. "Error running agent:") appears in result.output.

Comment on lines +1301 to +1310
def test_ai_review_invalid_agent(self, runner: CliRunner):
"""Should reject invalid agent."""
with runner.isolated_filesystem() as tmpdir:
task_file = Path(tmpdir) / "task.md"
task_file.write_text("Task content")

result = runner.invoke(main, ["ai", "review", "invalid_agent", str(task_file)])

assert result.exit_code != 0
assert "Invalid value" in result.output or "invalid_agent" in result.output
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (testing): Consider directly exercising _build_agent_command returning None (unsupported agent) if that branch is intended to be used.

Because agent is constrained by click.Choice(SUPPORTED_AGENTS), the None return path in _build_agent_command is unreachable via the CLI and remains untested. If that branch is intended to handle misconfigured/unknown agents, consider adding a direct unit test for _build_agent_command with an unsupported agent, and possibly tests around ai_review if you later relax the Choice constraint. Otherwise, if it’s truly dead code, it may be worth removing for clarity.


try:
# Run the agent command
result = subprocess.run(agent_cmd)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (python.lang.security.audit.dangerous-subprocess-use-audit): Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

Source: opengrep

Add `codebook ai` command group with:
- `codebook ai help` - Show available agents and usage
- `codebook ai review [agent] [path]` - Review tasks with AI agents

Supported agents: claude, codex, gemini, opencode, kimi

Features:
- Customizable review prompt via codebook.yml (ai.review_prompt)
- Pass additional arguments to agents with -- separator
- Propagates agent exit codes

Includes 17 new tests and documentation updates.
Fetches unaddressed PR comments using GitHub CLI. Shows regular comments,
review summaries, and line-level code review comments. Supports specifying
PR number with PR=<number> or auto-detects from current branch.
- Add security comment for subprocess.run explaining list format safety
- Refactor _build_agent_command to use dict mapping instead of if/elif chain
- Fix typo: remove comma before 'that' in AI_HELPERS.md
- Add test for generic exception handling in ai_review
- Add TestBuildAgentCommand class with unit tests including unsupported agent
- Add TestAIConfig class for config serialization/deserialization tests
- Add ai_review_env fixture to reduce test boilerplate

addressed
@github-actions
Copy link

github-actions bot commented Dec 28, 2025

📚 CodeBook Coverage

Overall: 78.0% (📈 +78.0% from main)

Changed Files

File Main PR Change
.gitattributes 0.0% 100.0% 📈 +100.0%
.gitignore 0.0% 100.0% 📈 +100.0%
.mcp.json 0.0% 100.0% 📈 +100.0%
codebook.yml 0.0% 100.0% 📈 +100.0%
codebook/AI_HELPERS.md 0.0% 100.0% 📈 +100.0%
examples/example.md 0.0% 100.0% 📈 +100.0%
src/codebook/main.py 0.0% 100.0% 📈 +100.0%
tests/init.py 0.0% 100.0% 📈 +100.0%
tests/test_client.py 0.0% 99.7% 📈 +99.7%
tests/test_watcher.py 0.0% 99.3% 📈 +99.3%
src/codebook/config.py 0.0% 98.2% 📈 +98.2%
codebook/edge-cases/CACHE_EXPIRATION.md 0.0% 97.5% 📈 +97.5%
codebook/edge-cases/BATCH_RESOLUTION_FALLBACK.md 0.0% 97.4% 📈 +97.4%
codebook/edge-cases/GIT_ROOT_RESOLUTION.md 0.0% 97.4% 📈 +97.4%
codebook/CONFIGURATION.md 0.0% 96.8% 📈 +96.8%
codebook/edge-cases/ANSI_ESCAPE_STRIPPING.md 0.0% 96.6% 📈 +96.6%
codebook/edge-cases/BACKLINK_DEDUPLICATION.md 0.0% 96.6% 📈 +96.6%
src/codebook/differ.py 0.0% 96.6% 📈 +96.6%
codebook/FRONTMATTER.md 0.0% 96.4% 📈 +96.4%
codebook/edge-cases/TASK_VERSION_DIFF_SKIPPING.md 0.0% 96.1% 📈 +96.1%

...and 29 more files with changes

- Add --json flag to task coverage command for machine-readable output
- Update CI workflow to compare coverage between PR and main branch
- Show per-file coverage changes in a markdown table
- Sort changes by biggest diff first, limit to top 20 files
@wende wende merged commit c019070 into main Dec 28, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant