feat(skills): add /update-docs and /recover context maintenance skills

StackMemory Bot (CLI) · StackMemory Bot (CLI) · commit b7a0913ef7b6 · 2026-03-13T18:57:19.000-04:00
- /update-docs: weekly audit of CLAUDE.md, MEMORY.md, agent_docs/ against
  git history and codebase drift
- /recover: session post-mortem that analyzes traces, finds drift points,
  and converts them into durable guidance updates
- Add auto-activate triggers in global CLAUDE.md
- Add Context Maintenance section to project CLAUDE.md with usage guidelines
diff --git a/.claude/skills/recover.md b/.claude/skills/recover.md
@@ -0,0 +1,95 @@
+# Recover
+
+Diagnose why a session went off the rails by analyzing traces, then update context docs to prevent recurrence.
+
+## Usage
+
+```
+/recover                  # Analyze current session drift
+/recover <session-id>     # Analyze a specific conductor session
+/recover last             # Analyze the most recent conductor run
+```
+
+## Instructions
+
+You are a session post-mortem analyst. The user feels the current session (or a past one) went off-target. Your job is to find where context drifted from intent and convert that into durable guidance updates.
+
+### Step 1: Identify the drift
+
+#### For current session drift:
+1. Ask the user: "What were you trying to accomplish, and where did it go wrong?"
+2. Review the conversation so far — identify the turn where behavior diverged from intent
+3. Classify the drift:
+   - **Wrong scope**: Agent did more/less than asked
+   - **Wrong approach**: Agent chose a suboptimal strategy
+   - **Wrong assumptions**: Agent assumed something false about the codebase
+   - **Missing context**: Agent didn't know something it should have
+   - **Conflicting guidance**: Two instructions in CLAUDE.md/MEMORY.md contradicted
+
+#### For conductor session analysis:
+1. Load trace data:
+   ```bash
+   stackmemory conductor traces <issue-id>
+   ```
+2. If a session-id is given, replay it:
+   ```bash
+   stackmemory conductor replay <session-id>
+   ```
+3. Identify the failure turn — look for:
+   - Phase transitions that shouldn't have happened (e.g., jumping to "implementing" without "reading")
+   - Tool calls that indicate confusion (repeated file reads, failed edits)
+   - Token spikes (agent thrashing)
+   - Error patterns in the last N turns
+
+### Step 2: Root cause analysis
+
+Map each drift to a root cause category:
+
+| Category | Signal | Fix |
+|---|---|---|
+| **Missing CLAUDE.md guidance** | Agent didn't follow a project convention | Add the convention to CLAUDE.md |
+| **Stale memory** | Agent acted on outdated info from MEMORY.md | Update or remove the memory entry |
+| **Missing memory** | Agent lacked context it couldn't derive from code | Add a new memory entry |
+| **Ambiguous instruction** | Agent interpreted guidance differently than intended | Rewrite the instruction to be unambiguous |
+| **Missing skill** | Agent should have had a specialized workflow | Create a new skill |
+| **Wrong task delegation** | Task was AUTOMATE-tier but needed CAREFUL treatment | Update delegation model |
+| **Prompt template gap** | Conductor agent prompt didn't include needed context | Update prompt-template.md |
+
+### Step 3: Prescribe fixes
+
+For each root cause, draft the specific change:
+
+```
+## Recovery Plan
+
+### 1. {Category}: {Brief description}
+**Drift**: {What happened}
+**Root cause**: {Why it happened}
+**Fix**: {Exact change to make}
+**File**: {CLAUDE.md | MEMORY.md | memory/{file}.md | skills/{file}.md | prompt-template.md}
+
+### 2. ...
+```
+
+### Step 4: Apply and verify
+
+After user confirms:
+
+1. Apply each fix to the appropriate file
+2. For CLAUDE.md changes: verify the instruction is clear by re-reading the section in context
+3. For MEMORY.md changes: ensure the index stays under 200 lines
+4. For prompt template changes: note that the change takes effect on the next conductor run
+5. Commit with: `fix(docs): session recovery — {root causes addressed}`
+
+### Step 5: Optionally resume work
+
+Ask the user: "Context docs updated. Want to retry the original task with the improved guidance?"
+
+If yes, re-read the updated docs and proceed with the original intent.
+
+### Rules
+
+- **Don't blame the model** — if the agent drifted, the context was insufficient. Fix the context.
+- **Be specific** — "add better guidance" is not a fix. "Add to CLAUDE.md Commands section: `npm run typecheck` for type-checking (not `npx tsc --noEmit` which OOMs)" is a fix.
+- **One fix per root cause** — don't shotgun changes. Each fix should be traceable to a specific drift.
+- **Test the fix mentally** — re-read the changed doc and ask "would this have prevented the drift?" If not, the fix is wrong.
diff --git a/.claude/skills/update-docs.md b/.claude/skills/update-docs.md
@@ -0,0 +1,105 @@
+# Update Docs
+
+Systematic review and update of context markdowns (CLAUDE.md, MEMORY.md, agent_docs/) based on session traces, git history, and codebase drift. Run weekly.
+
+## Usage
+
+```
+/update-docs              # Full review: CLAUDE.md + MEMORY.md + agent_docs/
+/update-docs memory       # Only review MEMORY.md entries
+/update-docs claude       # Only review CLAUDE.md
+/update-docs agents       # Only review agent_docs/
+```
+
+## Instructions
+
+You are a context gardener. Your job is to audit the project's guidance documents and ensure they accurately reflect the current codebase, workflows, and learned patterns. Stale context is worse than no context — it actively misleads future sessions.
+
+### Step 1: Gather evidence
+
+Run these in parallel to build a picture of what's changed since the docs were last updated:
+
+1. **Recent git history** (last 2 weeks):
+   ```bash
+   git log --oneline --since="2 weeks ago" --no-merges
+   ```
+
+2. **Files changed recently** (detect structural shifts):
+   ```bash
+   git diff --stat HEAD~50 HEAD | tail -20
+   ```
+
+3. **Current CLAUDE.md** — read it fully
+4. **Current MEMORY.md** — read it fully
+5. **All memory files** — `ls ~/.claude/projects/*/memory/` and read each
+6. **Agent docs index** — `ls agent_docs/` if it exists
+7. **Conductor traces** (if available):
+   ```bash
+   stackmemory conductor trace-stats 2>/dev/null
+   ```
+8. **Recent test results** — check if test counts or patterns have shifted:
+   ```bash
+   node -e "require('child_process').execSync('npx vitest run --reporter=json 2>/dev/null', {timeout:300000, encoding:'utf8'})" 2>/dev/null | node -e "const d=JSON.parse(require('fs').readFileSync('/dev/stdin','utf8'));console.log('Tests:',d.numPassedTests,'passed,',d.numFailedTests,'failed,',d.numTotalTestSuites,'suites')"
+   ```
+
+### Step 2: Audit each document
+
+For each document, evaluate every section against the evidence:
+
+#### CLAUDE.md audit checklist:
+- [ ] **Commands section**: Do all listed commands still work? Any new important ones missing?
+- [ ] **Project structure**: Does the tree match actual `src/` layout? New directories?
+- [ ] **Key files**: Are file paths still correct? Any important new files?
+- [ ] **Validation steps**: Still accurate? Test counts roughly right?
+- [ ] **Deploy section**: Still accurate? Any new steps?
+- [ ] **Conductor section**: Data files, learning loop, template variables all current?
+- [ ] **Task delegation model**: Categories still make sense given recent work?
+
+#### MEMORY.md audit checklist:
+- [ ] **Each memory entry**: Is it still true? Has the codebase changed underneath it?
+- [ ] **Missing knowledge**: Are there patterns from recent commits that should be captured?
+- [ ] **Stale entries**: Any memories about removed features, old APIs, or resolved issues?
+- [ ] **Duplicates**: Any memories that overlap or contradict each other?
+
+#### Agent docs audit:
+- [ ] **Each doc**: Does it reference current APIs, file paths, patterns?
+- [ ] **Coverage**: Any major subsystem lacking documentation?
+
+### Step 3: Propose changes
+
+Present findings as a structured report:
+
+```
+## Context Audit Report — {date}
+
+### Stale (remove or update)
+- MEMORY.md line X: "{entry}" — reason it's stale
+- CLAUDE.md section Y: outdated because Z
+
+### Missing (should add)
+- New pattern learned: {description} — evidence: {commit/file}
+- New gotcha discovered: {description}
+
+### Accurate (no change needed)
+- {section}: still valid ✓
+```
+
+### Step 4: Apply changes
+
+After presenting the report, ask: "Apply these changes?" If confirmed:
+
+1. **Update MEMORY.md** — edit stale entries, add missing ones, remove obsolete
+2. **Update individual memory files** — create/update/delete as needed
+3. **Update CLAUDE.md** — fix outdated sections (commands, paths, structure)
+4. **Update agent_docs/** — fix stale references
+
+### Rules
+
+- **Never delete memories about rejected integrations** — those prevent re-evaluation
+- **Never delete gotchas** — they prevent repeat mistakes
+- **Preserve dates** on memory entries so staleness is visible
+- **Add dates** to any new entries: `(2026-MM-DD)`
+- **Keep MEMORY.md under 200 lines** — it's loaded into every conversation
+- **Progressive disclosure**: If a topic needs >5 lines, put it in a dedicated memory file and link from MEMORY.md
+- **Don't rewrite CLAUDE.md from scratch** — make surgical updates. The structure is intentional.
+- **Commit changes** with message: `chore(docs): weekly context audit — {summary}`
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -213,6 +213,24 @@ Quality gates scale with tier — don't over-engineer AUTOMATE tasks, don't unde
 - Plan-execute sessions (low interaction, high edits) are most efficient
 - Avoid exploratory marathons with topic-switching — burns 30-40% extra tokens
 
+## Context Maintenance
+
+**`/update-docs`** — Run weekly or when context feels stale:
+- Audits CLAUDE.md, MEMORY.md, agent_docs/ against git history and codebase
+- Detects stale entries, missing patterns, outdated paths
+- Trigger: start of week, after major refactors, or when sessions feel slow/confused
+
+**`/recover`** — Run when a session goes off the rails:
+- Analyzes traces to find where context drifted from intent
+- Maps drift to specific doc fixes (missing guidance, stale memory, ambiguous instruction)
+- Trigger: user says "this is wrong", "not what I wanted", "off the rails", repeated corrections
+
+**When to use which:**
+- Session producing wrong results → `/recover` (diagnose + fix now)
+- Routine maintenance, nothing broken → `/update-docs` (proactive gardening)
+- After publishing a new version → `/update-docs` (catch version/path drift)
+- After conductor failures → `/recover last` (learn from agent traces)
+
 ## Workflow
 
 - Check .env for API keys before asking