Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions .claude/skills/validate-docs/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
name: validate-docs
description: Validate README features are documented in specs and covered by e2e tests. Use when checking documentation coverage or before merging docs changes.
---

# Validate Documentation

Validate that ALL features claimed in README are documented in specs and tested.

## Philosophy

The README is the product's promise to users. Every feature advertised must be:

1. Specified in `docs/specs/` (source of truth for behavior)
2. Tested in `frontend/tests/` (proof it works)

## Workflow

### Step 1: Extract ALL features from README

Read README.md and identify every feature claim. Look in:

- **How It Works** section - core functionality
- **UI** section - user-facing features
- **Agent Workflow** section - automation features
- Any diagrams, bullet points, or descriptions that promise functionality

For each feature, note:

- What it claims to do
- Whether it has a linked spec (some won't)

### Step 2: Map features to specs

Check if each README feature has a corresponding spec:

| README Feature | Expected Spec |
| ------------------------------------------ | ------------------------------------------------ |
| Main thread conversation | `docs/specs/chat.md` |
| Sessions/background work | `docs/specs/sessions.md` |
| Mobile/desktop layout | `docs/specs/layout.md` |
| Agent workflow (spawn → work → PR → close) | `docs/specs/agent-workflow.md` |
| Notifications | `docs/specs/sessions.md` (notifications section) |

**Flag any README feature without a spec as ERROR.**

### Step 3: Verify specs have testable assertions

For each spec file:

- Check it contains specific, testable statements
- Assertions should mention exact UI text, behaviors, or states
- Vague specs like "works well" are not testable

### Step 4: Map spec assertions to tests

For each bullet point/assertion in a spec, search for a test that verifies it:

```bash
# Example: search for test covering "No sessions yet" message
grep -r "No sessions yet" frontend/tests/
```

Track coverage for each spec assertion.

### Step 5: Output report

```markdown
## Documentation Validation Report

### README Features → Specs

- [x] Main thread conversation → docs/specs/chat.md
- [x] Sessions → docs/specs/sessions.md
- [x] Layout → docs/specs/layout.md
- [ ] **Agent Workflow → NO SPEC** ← ERROR

### Spec Assertions → Tests

#### docs/specs/chat.md (7/9 = 78%)

- [x] "Input field with placeholder" → e2e/user-journey.spec.ts
- [ ] "Messages persist across reloads" → NO TEST

#### docs/specs/sessions.md (23/23 = 100%)

- [x] "No sessions yet" → sessions/01-session-list-empty.spec.ts
...

### Summary

| Category | Coverage |
| -------------------------- | ----------- |
| README features with specs | 3/4 (75%) |
| Spec assertions with tests | 38/40 (95%) |

### Errors

1. README "Agent Workflow" section has no spec
2. chat.md "Messages persist across reloads" has no test
```

### Step 6: Exit status

- **ERROR** if any README feature lacks a spec
- **ERROR** if any spec assertion lacks test coverage
- **SUCCESS** only if fully covered

## Common Gaps to Watch For

1. **Advertised but unspecified** - README promises feature, no spec exists
2. **Specified but untested** - Spec describes behavior, no test verifies it
3. **Workflow features** - Multi-step flows (like agent lifecycle) often lack e2e coverage

## Tests to Flag for Removal

Not all tests are good tests. Flag these as problems:

1. **Tests without specs** - If a test exists but no spec describes the behavior, either:
- The spec is missing (add it), or
- The test is testing implementation details (remove it)

2. **Too technical / low-level** - Tests should verify user-facing behavior, not internals:
- ❌ "store updates when API returns data"
- ❌ "component re-renders on state change"
- ✅ "user sees updated session status after refresh"

3. **Testing implementation, not behavior** - Tests coupled to code structure:
- ❌ "calls fetchSessions with correct params"
- ❌ "dispatches ACTION_TYPE to store"
- ✅ "sessions list shows new session after spawning"

4. **Duplicate coverage** - Multiple tests verifying the same user behavior

5. **Tests for removed features** - Spec was removed but test remains

The test suite should read like a user manual, not a code audit.
10 changes: 6 additions & 4 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,10 @@ make test-reset # Clear DB + namespaces between runs

**Playwright agents for test maintenance:**

- Don't manually tweak tests - use `playwright-test-healer` to auto-fix failures
- For new features, use `playwright-test-planner` to explore and generate plans
- Specs in `docs/specs/` are the source of truth for app behavior
- Use `playwright-test-planner` to generate test plans from specs
- Use `playwright-test-generator` to create tests from plans
- Use `playwright-test-healer` to auto-fix failing tests

### Test Architecture (Flakiness Prevention)

Expand Down Expand Up @@ -115,6 +116,7 @@ Tests are organized into projects by execution mode:
- **Svelte 5 runes**: `$state`, `$derived`, `$effect`, `$props`
- **API calls**: Use `$lib/api.ts`, never hardcode URLs
- **DBOS workflows**: Bump `WORKFLOW_VERSION` in `dbos_config.py` when changing workflow logic
- **External docs**: Use context7 MCP to fetch up-to-date documentation for any library (DBOS, Svelte, Playwright, etc.)
- **HTML**: Be explicit, don't rely on browser defaults (`type="button"`, `rel="noopener"`, etc.)
- **Responsive layouts**: Use `isMobile` store to conditionally render, not CSS hide (avoids duplicate DOM elements)
- **K8s scripts**: Always use explicit `--context kind-${KIND_CLUSTER_NAME:-mainloop-test}` in kubectl commands to avoid targeting wrong cluster
Expand All @@ -129,10 +131,10 @@ make setup-claude-creds # Extract Claude credentials from Keychain
## Documentation Philosophy

```text
README.md → docs/specs/ → tests/
docs/specs/*.md → playwright-test-planner → tests/*.spec.ts
```

Specs define behavior. Tests are the source of truth. Keep docs in sync by running planner agent after feature changes.
Specs are the source of truth. They describe user-visible behavior in human-readable form, detailed enough for `playwright-test-planner` to generate tests. If tests fail: spec is wrong, code is wrong, or feature is in active development.

## Important

Expand Down
48 changes: 19 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ You (phone/laptop)
- **Main thread**: One continuous conversation — sessions spawn inline and surface results back
- **Sessions**: Background AI work with their own conversations; appear as colored threads in your timeline
- **Notifications**: Slack-style thread replies notify you when sessions need attention or complete
- **Persistence**: Conversations and sessions survive restarts via compaction + [DBOS](docs/DBOS.md)
- **Persistence**: Conversations and sessions survive restarts via compaction + [DBOS](https://docs.dbos.dev/)

## Quick Start

Expand Down Expand Up @@ -100,48 +100,38 @@ mainloop/

## Agent Workflow

Agents follow a structured workflow: **plan in issue → implement in draft PR → iterate until CI green → ready for human review**.
Agents are sessions spawned for development tasks. Each agent gets its own K8s namespace for isolated iteration.

```text
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
Planning │────►│ Draft │────►│ Iteration │────►│ Review
(GH Issue) │ (PR) │ │ (CI Loop) │ │ (Human)
Spawn │────►│ Work │────►│ PR │────►│ Close
(main) │ (k8s ns) │ │ (GitHub) │ │ (summary)
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
▲ │
└───────────────────┘
check in / spawn more
```

### Phases
1. **Spawn** - Main thread creates agent for a task
2. **Work** - Agent iterates in its own K8s namespace (build, test, debug)
3. **PR** - Agent creates and merges GitHub PR when ready
4. **Close** - Agent posts summary back to main thread

1. **Planning (GitHub Issue)** - Agent creates/updates an issue with problem analysis, proposed approach, and implementation plan. The issue is the "thinking out loud" space before code.
You stay in main thread, checking in on agents and spawning new ones as needed.

2. **Draft PR** - Agent creates a draft PR linked to the issue. Implements in small, logical commits. Uses PR comments to narrate progress and decisions.

3. **Iteration (CI Loop)** - Agent polls GitHub Actions after each push. On failure: analyzes logs, fixes, commits. Continues until green checkmark.

4. **Ready for Review** - Agent marks PR ready and adds summary comment. Human reviewer steps in for final approval.

### Verification

Agents use these tools to verify work before marking ready:

- **LSP server integration** - Real-time type/lint errors
- **`trunk` CLI** - Unified super-linter
- **Project test suites** - Via GitHub Actions
## Documentation

### Project Template (Future)
**Specs** (source of truth for app behavior):

| Component | Purpose |
| -------------- | ------------------------------------ |
| GitHub Actions | CI pipeline (lint, type-check, test) |
| K8s/Helm | Preview environments per PR |
| CNPG operator | Dynamic test databases |
| trunk.yaml | Unified linter config |
- [Chat](docs/specs/chat.md) - Main thread conversation
- [Sessions](docs/specs/sessions.md) - Background work and status
- [Layout](docs/specs/layout.md) - Mobile and desktop views

## Documentation
**Guides**:

- [Architecture](docs/architecture.md) - System design and data flow
- [Development](docs/development.md) - Local setup and commands
- [DBOS Workflows](docs/DBOS.md) - Durable task orchestration
- [Contributing](CONTRIBUTING.md) - How to contribute to mainloop
- [Contributing](CONTRIBUTING.md) - How to contribute

## License

Expand Down
30 changes: 30 additions & 0 deletions ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Roadmap

Future ideas and features under consideration.

## CI Loop Automation

Agent automatically iterates on CI failures:

- Poll GitHub Actions after each push
- On failure: analyze logs, fix, commit
- Continue until green checkmark

## GitHub Issue Planning

Agent creates/updates issues before coding:

- Problem analysis and proposed approach
- Implementation plan as "thinking out loud" space
- Links PR back to issue for context

## Project Template

Standardized repo setup for mainloop agents:

| Component | Purpose |
| -------------- | ------------------------------------ |
| GitHub Actions | CI pipeline (lint, type-check, test) |
| K8s/Helm | Preview environments per PR |
| CNPG operator | Dynamic test databases |
| trunk.yaml | Unified linter config |
Loading
Loading