Skip to content

Experiment: Selectively downgrade sub-agents from Opus to Sonnet #18

@valorengels

Description

@valorengels

Context

The sub-agent optimization plan (docs/plans/skill_workflow_subagent_optimization.md) uses Opus for all sub-agents to ensure zero quality regression from the architectural refactor. This is the right default — get the architecture right first, then optimize costs.

Once the refactored workflow is validated end-to-end on a real episode, we can experiment with selectively downgrading specific sub-agents to Sonnet where quality is comparable.

Proposed Experiments

Run each experiment on existing episode data (ep5 and ep6) by generating outputs with both Opus and Sonnet, then comparing quality side-by-side.

Tier 1: Most likely to be safe as Sonnet

Agent Rationale Risk
Research Digest Structured extraction, not creative work. Clear template to follow. Low
Briefing Validator Binary pass/fail checklist against known criteria. Low
Plan Validator Binary pass/fail checklist against known criteria. Low
Metadata Writer Template-driven, structured output from known inputs. Low

Tier 2: Needs careful comparison

Agent Rationale Risk
Research Q&A Answering specific questions about a single file. Sonnet likely sufficient but nuance matters. Medium
Question Discovery Analytical gap analysis. Creative identification of "what we don't know." Medium

Tier 3: Highest quality sensitivity — test carefully

Agent Rationale Risk
Cross-Validation Comparing subtle contradictions across 5 research files (~167KB). Nuance-dependent. Medium-High
Episode Planner Architecturally complex creative work: counterpoint design with assigned speaker positions, mode-switching frameworks, depth budgets, episode arc design. Directly determines audio quality. High

Not candidates for downgrade

Agent Rationale
Master Briefing Writer Already validated as Opus-dependent. Handles all 12 Wave 1 sections in a single pass.
Synthesis Writer Core creative output. Already Opus.

Methodology

For each agent being tested:

  1. Run the agent on ep5 research data with Opus → save output as {output}-opus.md
  2. Run the agent on ep5 research data with Sonnet → save output as {output}-sonnet.md
  3. Repeat on ep6 data
  4. Compare outputs on:
    • Completeness: Does Sonnet hit all required sections/criteria?
    • Nuance: Does Sonnet miss subtle contradictions, alternative framings, or creative structural decisions?
    • Specificity: Are parameters, citations, and details equally precise?
  5. Document findings per agent with recommendation: keep Opus or safe to downgrade to Sonnet

Success Criteria

  • Each downgrade decision is backed by side-by-side evidence from 2+ episodes
  • No quality regression detected in final episode outputs (report.md, content_plan.md)
  • Cost savings documented per agent

Dependencies

  • Requires the sub-agent optimization plan to be implemented and validated first
  • Run experiments only after at least one successful end-to-end episode with all-Opus sub-agents

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions