-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Context
The sub-agent optimization plan (docs/plans/skill_workflow_subagent_optimization.md) uses Opus for all sub-agents to ensure zero quality regression from the architectural refactor. This is the right default — get the architecture right first, then optimize costs.
Once the refactored workflow is validated end-to-end on a real episode, we can experiment with selectively downgrading specific sub-agents to Sonnet where quality is comparable.
Proposed Experiments
Run each experiment on existing episode data (ep5 and ep6) by generating outputs with both Opus and Sonnet, then comparing quality side-by-side.
Tier 1: Most likely to be safe as Sonnet
| Agent | Rationale | Risk |
|---|---|---|
| Research Digest | Structured extraction, not creative work. Clear template to follow. | Low |
| Briefing Validator | Binary pass/fail checklist against known criteria. | Low |
| Plan Validator | Binary pass/fail checklist against known criteria. | Low |
| Metadata Writer | Template-driven, structured output from known inputs. | Low |
Tier 2: Needs careful comparison
| Agent | Rationale | Risk |
|---|---|---|
| Research Q&A | Answering specific questions about a single file. Sonnet likely sufficient but nuance matters. | Medium |
| Question Discovery | Analytical gap analysis. Creative identification of "what we don't know." | Medium |
Tier 3: Highest quality sensitivity — test carefully
| Agent | Rationale | Risk |
|---|---|---|
| Cross-Validation | Comparing subtle contradictions across 5 research files (~167KB). Nuance-dependent. | Medium-High |
| Episode Planner | Architecturally complex creative work: counterpoint design with assigned speaker positions, mode-switching frameworks, depth budgets, episode arc design. Directly determines audio quality. | High |
Not candidates for downgrade
| Agent | Rationale |
|---|---|
| Master Briefing Writer | Already validated as Opus-dependent. Handles all 12 Wave 1 sections in a single pass. |
| Synthesis Writer | Core creative output. Already Opus. |
Methodology
For each agent being tested:
- Run the agent on ep5 research data with Opus → save output as
{output}-opus.md - Run the agent on ep5 research data with Sonnet → save output as
{output}-sonnet.md - Repeat on ep6 data
- Compare outputs on:
- Completeness: Does Sonnet hit all required sections/criteria?
- Nuance: Does Sonnet miss subtle contradictions, alternative framings, or creative structural decisions?
- Specificity: Are parameters, citations, and details equally precise?
- Document findings per agent with recommendation:
keep Opusorsafe to downgrade to Sonnet
Success Criteria
- Each downgrade decision is backed by side-by-side evidence from 2+ episodes
- No quality regression detected in final episode outputs (report.md, content_plan.md)
- Cost savings documented per agent
Dependencies
- Requires the sub-agent optimization plan to be implemented and validated first
- Run experiments only after at least one successful end-to-end episode with all-Opus sub-agents
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request