Multi-Model A/B Testing

## Summary

Run the same agent with different models simultaneously, split traffic, compare response quality/latency/cost. Dashboard visualization of A/B results.

## Motivation

Strongly.AI has this. MATE already supports 50+ models via LiteLLM — A/B testing is a natural extension. Helps users find the best model for each agent without guesswork or manual switching.

## Scope

- A/B config on agent: list of model variants with traffic split percentages
- Traffic splitter in AgentManager based on configured weights
- Track per-variant: latency, token usage, cost, error rate
- Optional: quality scoring via eval framework (ties into Agent Evaluation issue)
- Dashboard: A/B experiment creation, live results comparison, winner selection
- Statistical significance calculation

## Acceptance Criteria

- [ ] Configure multiple model variants with traffic split
- [ ] Traffic correctly distributed per weights
- [ ] Per-variant metrics tracked and visible in dashboard
- [ ] Ability to declare winner and apply to 100% traffic


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-Model A/B Testing #18

Summary

Motivation

Scope

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multi-Model A/B Testing #18

Description

Summary

Motivation

Scope

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions