Experiment: Selectively downgrade sub-agents from Opus to Sonnet

## Context

The sub-agent optimization plan (`docs/plans/skill_workflow_subagent_optimization.md`) uses Opus for **all** sub-agents to ensure zero quality regression from the architectural refactor. This is the right default — get the architecture right first, then optimize costs.

Once the refactored workflow is validated end-to-end on a real episode, we can experiment with selectively downgrading specific sub-agents to Sonnet where quality is comparable.

## Proposed Experiments

Run each experiment on existing episode data (ep5 and ep6) by generating outputs with both Opus and Sonnet, then comparing quality side-by-side.

### Tier 1: Most likely to be safe as Sonnet

| Agent | Rationale | Risk |
|-------|-----------|------|
| **Research Digest** | Structured extraction, not creative work. Clear template to follow. | Low |
| **Briefing Validator** | Binary pass/fail checklist against known criteria. | Low |
| **Plan Validator** | Binary pass/fail checklist against known criteria. | Low |
| **Metadata Writer** | Template-driven, structured output from known inputs. | Low |

### Tier 2: Needs careful comparison

| Agent | Rationale | Risk |
|-------|-----------|------|
| **Research Q&A** | Answering specific questions about a single file. Sonnet likely sufficient but nuance matters. | Medium |
| **Question Discovery** | Analytical gap analysis. Creative identification of "what we don't know." | Medium |

### Tier 3: Highest quality sensitivity — test carefully

| Agent | Rationale | Risk |
|-------|-----------|------|
| **Cross-Validation** | Comparing subtle contradictions across 5 research files (~167KB). Nuance-dependent. | Medium-High |
| **Episode Planner** | Architecturally complex creative work: counterpoint design with assigned speaker positions, mode-switching frameworks, depth budgets, episode arc design. Directly determines audio quality. | High |

### Not candidates for downgrade

| Agent | Rationale |
|-------|-----------|
| **Master Briefing Writer** | Already validated as Opus-dependent. Handles all 12 Wave 1 sections in a single pass. |
| **Synthesis Writer** | Core creative output. Already Opus. |

## Methodology

For each agent being tested:

1. Run the agent on ep5 research data with Opus → save output as `{output}-opus.md`
2. Run the agent on ep5 research data with Sonnet → save output as `{output}-sonnet.md`
3. Repeat on ep6 data
4. Compare outputs on:
   - **Completeness:** Does Sonnet hit all required sections/criteria?
   - **Nuance:** Does Sonnet miss subtle contradictions, alternative framings, or creative structural decisions?
   - **Specificity:** Are parameters, citations, and details equally precise?
5. Document findings per agent with recommendation: `keep Opus` or `safe to downgrade to Sonnet`

## Success Criteria

- Each downgrade decision is backed by side-by-side evidence from 2+ episodes
- No quality regression detected in final episode outputs (report.md, content_plan.md)
- Cost savings documented per agent

## Dependencies

- Requires the sub-agent optimization plan to be implemented and validated first
- Run experiments only after at least one successful end-to-end episode with all-Opus sub-agents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: Selectively downgrade sub-agents from Opus to Sonnet #18

Context

Proposed Experiments

Tier 1: Most likely to be safe as Sonnet

Tier 2: Needs careful comparison

Tier 3: Highest quality sensitivity — test carefully

Not candidates for downgrade

Methodology

Success Criteria

Dependencies

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Agent	Rationale	Risk
Research Digest	Structured extraction, not creative work. Clear template to follow.	Low
Briefing Validator	Binary pass/fail checklist against known criteria.	Low
Plan Validator	Binary pass/fail checklist against known criteria.	Low
Metadata Writer	Template-driven, structured output from known inputs.	Low

Agent	Rationale	Risk
Research Q&A	Answering specific questions about a single file. Sonnet likely sufficient but nuance matters.	Medium
Question Discovery	Analytical gap analysis. Creative identification of "what we don't know."	Medium

Agent	Rationale	Risk
Cross-Validation	Comparing subtle contradictions across 5 research files (~167KB). Nuance-dependent.	Medium-High
Episode Planner	Architecturally complex creative work: counterpoint design with assigned speaker positions, mode-switching frameworks, depth budgets, episode arc design. Directly determines audio quality.	High

Agent	Rationale
Master Briefing Writer	Already validated as Opus-dependent. Handles all 12 Wave 1 sections in a single pass.
Synthesis Writer	Core creative output. Already Opus.

Experiment: Selectively downgrade sub-agents from Opus to Sonnet #18

Description

Context

Proposed Experiments

Tier 1: Most likely to be safe as Sonnet

Tier 2: Needs careful comparison

Tier 3: Highest quality sensitivity — test carefully

Not candidates for downgrade

Methodology

Success Criteria

Dependencies

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions