Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .beads/issues.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@
{"id":"ge-hch.5.15.23","title":"Tests: Director Config UI","description":"UI tests for Director configuration.\n\n## Acceptance Criteria\n- [ ] Test: changing threshold updates getSettings().directorRiskThreshold\n- [ ] Test: invalid threshold (2.0) clamped to valid range\n- [ ] Test: high threshold (0.8) accepts more proposals than low (0.2)\n- [ ] Test: disabling Director falls back to naive injection\n\n## Related Feature\nge-hch.5.15.7 (Director Configuration UI)","status":"closed","priority":2,"issue_type":"task","assignee":"@OpenCode","created_at":"2026-01-16T15:04:07.991961562-08:00","created_by":"rgardler","updated_at":"2026-01-17T01:40:45.906548983-08:00","closed_at":"2026-01-17T01:40:45.906582258-08:00","dependencies":[{"issue_id":"ge-hch.5.15.23","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:04:07.992789597-08:00","created_by":"rgardler"}],"comments":[{"id":197,"issue_id":"ge-hch.5.15.23","author":"rgardler","text":"Added deterministic mock proposal hook to inkrunner and updated Playwright tests to use mock proposals for Director acceptance tests. This avoids hitting external LLM endpoints and makes approval counts deterministic. Files changed: web/demo/js/inkrunner.js, tests/demo.telemetry.spec.ts. (Assignee: @OpenCode)","created_at":"2026-01-17T07:29:10Z"},{"id":198,"issue_id":"ge-hch.5.15.23","author":"rgardler","text":"Completed Director UI tests and deterministic mock hooks. Added/updated: web/demo/js/inkrunner.js, web/demo/js/director.js, tests/demo.telemetry.spec.ts, tests/unit/director.test.js. Ran unit tests (npm run test:unit) and Playwright demo tests locally; both passed. PR https://github.com/TheWizardsCode/GEngine/pull/156 merged. Deleting local branch feature/ge-hch.5.15-director and remote counterpart after merge. Closing per acceptance criteria: threshold updates, clamping, high/low threshold behavior, and Director disable fallback are covered by tests. (Assignee: @OpenCode)","created_at":"2026-01-17T09:40:44Z"}]}
{"id":"ge-hch.5.15.24","title":"Implement: Decision Telemetry","description":"Add telemetry emission to director.js.\n\n## Acceptance Criteria\n- [ ] emitDecisionTelemetry(decision, metrics) function\n- [ ] Emits director_decision event with proposal_id, timestamp, decision, reason, riskScore, latencyMs, metrics\n- [ ] Uses telemetry.js if available, console.log fallback\n- [ ] Buffers last 50 events in sessionStorage\n\n## Related Feature\nge-hch.5.15.8 (Decision Telemetry Emitter)","acceptance_criteria":"- emitDecisionTelemetry(decision, metrics) function\\n- Emits director_decision event with proposal_id, timestamp, decision, reason, riskScore, latencyMs, metrics\\n- Includes timing fields: writerMs, directorMs, totalMs (ms) in the payload\\n- Uses telemetry.js if available, console.log fallback\\n- Buffers last 50 events in sessionStorage","status":"closed","priority":2,"issue_type":"task","assignee":"@Patch","created_at":"2026-01-16T15:04:16.411083197-08:00","created_by":"rgardler","updated_at":"2026-01-17T15:29:04.002562756-08:00","closed_at":"2026-01-17T15:29:04.002562756-08:00","close_reason":"Merged PR #162 — Completed","dependencies":[{"issue_id":"ge-hch.5.15.24","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:04:16.413016807-08:00","created_by":"rgardler"}],"comments":[{"id":203,"issue_id":"ge-hch.5.15.24","author":"rgardler","text":"Added acceptance criteria: include timing fields writerMs, directorMs, and totalMs in the director_decision telemetry payload. These should record writer latency, director latency, and combined total latency in milliseconds. Update unit and integration tests to assert presence and numeric types for these fields.","created_at":"2026-01-17T20:55:27Z"},{"id":204,"issue_id":"ge-hch.5.15.24","author":"rgardler","text":"Implemented timing fields (writerMs, directorMs, totalMs) in director_decision telemetry, ensuring evaluate and all rejection paths emit them; updated unit + Playwright demo tests to assert presence and numeric types. Ran npm test (unit + Playwright) successfully on branch feature/ge-hch.5.15.24-telemetry.","created_at":"2026-01-17T21:07:40Z"}]}
{"id":"ge-hch.5.15.25","title":"Tests: Decision Telemetry","description":"Unit tests for telemetry emission.\n\n## Acceptance Criteria\n- [ ] Test: decision emits event with all required fields\n- [ ] Test: timestamp is valid ISO8601\n- [ ] Test: missing proposal_id generates UUID\n- [ ] Test: after 5 choices, sessionStorage contains 5 events\n\n## Related Feature\nge-hch.5.15.8 (Decision Telemetry Emitter)","status":"closed","priority":2,"issue_type":"task","assignee":"@Patch","created_at":"2026-01-16T15:04:16.491963828-08:00","created_by":"rgardler","updated_at":"2026-01-18T01:42:07.551830978-08:00","closed_at":"2026-01-18T01:42:07.551830978-08:00","close_reason":"Completed","external_ref":"https://github.com/TheWizardsCode/GEngine/pull/174","labels":["Status: PR Created"],"dependencies":[{"issue_id":"ge-hch.5.15.25","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:04:16.507643334-08:00","created_by":"rgardler"}]}
{"id":"ge-hch.5.15.26","title":"Docs: Decision Telemetry Schema","description":"Document telemetry event schema.\n\n## Acceptance Criteria\n- [ ] Document director_decision event fields\n- [ ] Include example JSON event\n- [ ] Note sessionStorage buffer behavior\n- [ ] Reference to telemetry-schema.md design doc\n\n## Related Feature\nge-hch.5.15.8 (Decision Telemetry Emitter)","status":"open","priority":2,"issue_type":"task","assignee":"Scribbler","created_at":"2026-01-16T15:04:16.562304471-08:00","created_by":"rgardler","updated_at":"2026-01-16T15:04:16.562304471-08:00","dependencies":[{"issue_id":"ge-hch.5.15.26","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:04:16.563250103-08:00","created_by":"rgardler"}]}
{"id":"ge-hch.5.15.26","title":"Docs: Decision Telemetry Schema","description":"Document telemetry event schema.\n\n## Acceptance Criteria\n- [ ] Document director_decision event fields\n- [ ] Include example JSON event\n- [ ] Note sessionStorage buffer behavior\n- [ ] Reference to telemetry-schema.md design doc\n\n## Related Feature\nge-hch.5.15.8 (Decision Telemetry Emitter)","status":"closed","priority":2,"issue_type":"task","assignee":"Scribbler","created_at":"2026-01-16T15:04:16.562304471-08:00","created_by":"rgardler","updated_at":"2026-01-18T01:44:21.546914082-08:00","closed_at":"2026-01-18T01:44:21.546914082-08:00","close_reason":"Docs updated","dependencies":[{"issue_id":"ge-hch.5.15.26","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:04:16.563250103-08:00","created_by":"rgardler"}],"comments":[{"id":215,"issue_id":"ge-hch.5.15.26","author":"rgardler","text":"Updated docs/dev/m2-design/telemetry-schema.md: expanded telemetry with field table, clarified fit-vs-risk convention, added sessionStorage buffering note, and linked to director-algorithm telemetry emission section. (ge-hch.5.15.26)","created_at":"2026-01-18T09:44:20Z"},{"id":216,"issue_id":"ge-hch.5.15.26","author":"rgardler","text":"Aligned doc to current web demo telemetry emitter (Option A): documented flat director_decision payload (decision/reason/riskScore/timing + metrics.*), added example with payload block, and clarified sessionStorage ring buffer key ge-hch.director.telemetry (last 50).","created_at":"2026-01-18T09:48:39Z"}]}
{"id":"ge-hch.5.15.3","title":"Risk Scorer (3+3 Metrics)","description":"Compute a risk score that predicts whether a branch will feel coherent to the player.\n\n## Player Experience Change\nPlayers will see fewer 'off' or jarring AI branches. Branches that don't fit the narrative pacing or have low Writer confidence are filtered out.\n\n## Acceptance Criteria\n- [ ] Computes weighted risk score (0.0–1.0), where 0.0=safe, 1.0=high risk\n- [ ] Active metrics implemented:\n - `proposal_confidence_risk`: `1.0 - proposal.metadata.confidence_score`\n - `narrative_pacing_risk`: based on branch length vs. expected range\n - `return_path_confidence_risk`: from return-path checker\n- [ ] Placeholder metrics return configurable defaults (0.3):\n - `thematic_consistency_risk`, `lore_adherence_risk`, `character_voice_risk`\n- [ ] Consistent: same input → same output\n- [ ] Determinism test: 10 calls with same input produce identical riskScore\n- [ ] Unit test: high-confidence proposal (0.9) → low risk score (\u003c0.3)\n- [ ] Unit test: low-confidence proposal (0.3) → high risk score (\u003e0.5)\n- [ ] Unit test: very long branch (\u003e500 tokens in exposition phase) → elevated pacing risk\n\n## Minimal Implementation\n- Create `computeRiskScore(proposal, context, config)` function\n- Implement 3 active metrics\n- Weighted average with default weights from design doc\n\n## Dependencies\n- ge-hch.5.15.1 (Decision Flow Engine)\n- ge-hch.5.15.2 (Return-Path Feasibility Checker)\n\n## Deliverables\n- Risk scorer in director.js\n- Unit tests for each metric\n- Config schema for weights","status":"closed","priority":1,"issue_type":"feature","assignee":"Patch","created_at":"2026-01-16T15:01:50.954803291-08:00","created_by":"rgardler","updated_at":"2026-01-17T11:36:20.913696503-08:00","closed_at":"2026-01-17T11:36:20.913696503-08:00","close_reason":"PR merged (gh-158) — risk scorer implemented","external_ref":"gh-158","labels":["Status: PR Created"],"dependencies":[{"issue_id":"ge-hch.5.15.3","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:01:50.955629677-08:00","created_by":"rgardler"},{"issue_id":"ge-hch.5.15.3","depends_on_id":"ge-hch.5.15.1","type":"blocks","created_at":"2026-01-16T15:04:32.2862167-08:00","created_by":"rgardler"},{"issue_id":"ge-hch.5.15.3","depends_on_id":"ge-hch.5.15.2","type":"blocks","created_at":"2026-01-16T15:04:32.327828266-08:00","created_by":"rgardler"}]}
{"id":"ge-hch.5.15.4","title":"Embedding Service (transformers.js)","description":"Provide local semantic similarity using transformers.js for future intelligent risk metrics.\n\n## Player Experience Change\nNone immediately — this is infrastructure for deferred metrics (thematic consistency, LORE adherence, character voice). Enables future improvements without additional API costs.\n\n## Acceptance Criteria\n- [ ] Model runs in WebWorker (UI thread not blocked)\n- [ ] API: `embed(text)` returns embedding vector\n- [ ] API: `similarity(vec1, vec2)` returns cosine similarity (0.0–1.0)\n- [ ] Model loads lazily on first `embed()` call\n- [ ] Graceful fallback: if model fails to load, `embed()` returns null, `similarity()` returns 0.5\n- [ ] Unit test: `similarity('happy', 'joyful')` \u003e 0.7\n- [ ] Unit test: `similarity('happy', 'database')` \u003c 0.4\n- [ ] Unit test: `embed(null)` returns null gracefully\n- [ ] Performance test: first embed() \u003c 3s (model load); subsequent \u003c 100ms\n\n## Minimal Implementation\n- Create `web/demo/js/embedding-service.js`\n- Load `Xenova/all-MiniLM-L6-v2` via transformers.js\n- WebWorker wrapper for non-blocking inference\n- Cache embeddings for repeated texts\n\n## Dependencies\n- None (parallel development)\n\n## Deliverables\n- `web/demo/js/embedding-service.js`\n- WebWorker script\n- Unit tests with sample texts","status":"open","priority":2,"issue_type":"feature","created_at":"2026-01-16T15:02:02.704393975-08:00","created_by":"rgardler","updated_at":"2026-01-16T15:02:02.704393975-08:00","dependencies":[{"issue_id":"ge-hch.5.15.4","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:02:02.70547581-08:00","created_by":"rgardler"}]}
{"id":"ge-hch.5.15.5","title":"Player Preference Tracker","description":"Track which types of AI branches the player accepts/rejects to personalize future offers.\n\n## Player Experience Change\nOver time, the system learns player preferences. Players who prefer exploration branches will see more exploration options offered; players who reject dialogue-heavy branches will see fewer.\n\n## Acceptance Criteria\n- [ ] Records: `{ branchType, accepted: boolean, timestamp }` on each Director decision\n- [ ] Computes preference score per branch type (0.0–1.0, based on accept ratio)\n- [ ] Persists in localStorage key `ge-hch.ai-preferences`\n- [ ] Cold-start: returns 0.5 for all types when no history\n- [ ] API: `getPreference(branchType)` → number\n- [ ] API: `recordOutcome(branchType, accepted)` → void\n- [ ] Unit test: after 3 accepts + 1 reject of 'dialogue', preference \u003e 0.6\n- [ ] Unit test: after 0 history, preference = 0.5\n- [ ] Unit test: after 100+ events, preference calculation remains performant (\u003c10ms)\n- [ ] Integration: risk scorer uses preference to adjust player_preference_risk\n\n## Minimal Implementation\n- Create `web/demo/js/player-preference.js`\n- Track accept/reject counts per branch type\n- Simple ratio calculation with smoothing\n\n## Dependencies\n- ge-hch.5.15.3 (Risk Scorer)\n\n## Deliverables\n- `web/demo/js/player-preference.js`\n- Unit tests\n- Integration with localStorage","status":"open","priority":2,"issue_type":"feature","created_at":"2026-01-16T15:02:12.247694133-08:00","created_by":"rgardler","updated_at":"2026-01-16T15:02:12.247694133-08:00","dependencies":[{"issue_id":"ge-hch.5.15.5","depends_on_id":"ge-hch.5.15","type":"parent-child","created_at":"2026-01-16T15:02:12.248718041-08:00","created_by":"rgardler"},{"issue_id":"ge-hch.5.15.5","depends_on_id":"ge-hch.5.15.3","type":"blocks","created_at":"2026-01-16T15:04:32.372750464-08:00","created_by":"rgardler"}]}
Expand Down
92 changes: 55 additions & 37 deletions docs/dev/m2-design/telemetry-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,54 +143,72 @@ M2 emits events at 7 key decision points:
- Risk score distribution by branch type
- Most common violations (to prioritize rule tuning)

#### Event 3: Director Decision Made
#### Event 3: Director Decision Made (`director_decision`)

**When**: Director evaluates proposal and makes accept/reject decision
**When**: Director evaluates a proposal and makes an accept/reject decision.

**Purpose**: Track Director heuristics and decision patterns
**Purpose**: Track Director heuristics and decision patterns.

##### Schema (web demo payload)

The web demo emits a **flat** `director_decision` payload (no nested `event_data` envelope) with these fields:

| Field | Type | Required | Notes |
|---|---:|:---:|---|
| `proposal_id` | string | ✅ | Correlates with proposal lifecycle events; falls back to a UUID if missing. |
| `decision` | string | ✅ | `approve` or `reject` (demo enums). |
| `reason` | string | ✅ | Human-readable rationale (e.g., `Risk acceptable`, `Return path check failed`). |
| `riskScore` | number | ✅ | Overall risk (0.0–1.0). Lower is better. |
| `latencyMs` | number | ✅ | Director evaluation latency (ms). |
| `writerMs` | number | ✅ | Writer latency passed in for telemetry (ms). |
| `directorMs` | number | ✅ | Same as `latencyMs` in the demo (ms). |
| `totalMs` | number | ✅ | `writerMs + directorMs`. |
| `timestamp` | string | ✅ | ISO timestamp generated at emission time. |
| `metrics.confidence` | number \| null | ❌ | Fit metric 0.0–1.0 (higher is better). |
| `metrics.pacing` | number \| null | ❌ | Fit metric 0.0–1.0 (higher is better). |
| `metrics.returnPath` | number \| null | ❌ | Fit metric 0.0–1.0 (higher is better). |
| `metrics.thematic` | number \| null | ❌ | Placeholder fit metric. |
| `metrics.lore` | number \| null | ❌ | Placeholder fit metric. |
| `metrics.voice` | number \| null | ❌ | Placeholder fit metric. |

##### Example

```json
{
"event_type": "director_decision",
"event_data": {
"payload": {
"proposal_id": "proposal-87f4c290",
"decision": "approved_for_runtime",

"director_reasoning": {
"validation_passed": true,
"risk_score": 0.15,
"risk_metrics": {
"thematic_consistency": 0.85,
"lore_adherence": 0.90,
"character_voice_consistency": 0.87,
"narrative_pacing_fit": 0.80,
"player_preference_fit": 0.82,
"proposal_confidence": 0.87
},
"weighted_risk_score": 0.14,
"decision_threshold": 0.30,
"return_path_feasible": true,
"player_preference_details": {
"branch_type_match": 0.88,
"theme_match": 0.79,
"complexity_match": 0.85,
"historical_engagement": 0.78,
"frequency_appropriateness": 0.90
}
},

"player_engagement": {
"recent_action_level": 0.65,
"recent_success_rate": 0.82,
"narrative_phase": "rising_action",
"director_creativity_set_to": 0.65
},

"decision_time_ms": 125
"decision": "approve",
"reason": "Risk acceptable",
"riskScore": 0.18,
"latencyMs": 92,
"writerMs": 240,
"directorMs": 92,
"totalMs": 332,
"timestamp": "2026-01-20T14:30:22Z",
"metrics": {
"confidence": 0.87,
"pacing": 0.12,
"returnPath": 0.90,
"thematic": null,
"lore": null,
"voice": null
}
}
}
```

##### Implementation note: buffering

In the web demo, `director_decision` events are buffered in **`sessionStorage`** under the key `ge-hch.director.telemetry` as a ring buffer of the **last 50 events**. This ensures:
- no server dependency for local dev
- events survive simple page navigations
- bounded memory vs `localStorage`

(If the buffer is cleared, only in-session telemetry is impacted; gameplay state is unaffected.)

**Design reference**: `docs/dev/m2-design/telemetry-schema.md` (this document) and `docs/dev/m2-design/director-algorithm.md` (Telemetry Emission section).

**Metrics extracted**:
- Risk score distribution
- Decision threshold sensitivity
Expand Down