Skip to content

Multi-Model A/B Testing #18

@antiv

Description

@antiv

Summary

Run the same agent with different models simultaneously, split traffic, compare response quality/latency/cost. Dashboard visualization of A/B results.

Motivation

Strongly.AI has this. MATE already supports 50+ models via LiteLLM — A/B testing is a natural extension. Helps users find the best model for each agent without guesswork or manual switching.

Scope

  • A/B config on agent: list of model variants with traffic split percentages
  • Traffic splitter in AgentManager based on configured weights
  • Track per-variant: latency, token usage, cost, error rate
  • Optional: quality scoring via eval framework (ties into Agent Evaluation issue)
  • Dashboard: A/B experiment creation, live results comparison, winner selection
  • Statistical significance calculation

Acceptance Criteria

  • Configure multiple model variants with traffic split
  • Traffic correctly distributed per weights
  • Per-variant metrics tracked and visible in dashboard
  • Ability to declare winner and apply to 100% traffic

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityanalyticsAnalytics and metricsenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions