-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
P2Medium priorityMedium priorityanalyticsAnalytics and metricsAnalytics and metricsenhancementNew feature or requestNew feature or request
Description
Summary
Run the same agent with different models simultaneously, split traffic, compare response quality/latency/cost. Dashboard visualization of A/B results.
Motivation
Strongly.AI has this. MATE already supports 50+ models via LiteLLM — A/B testing is a natural extension. Helps users find the best model for each agent without guesswork or manual switching.
Scope
- A/B config on agent: list of model variants with traffic split percentages
- Traffic splitter in AgentManager based on configured weights
- Track per-variant: latency, token usage, cost, error rate
- Optional: quality scoring via eval framework (ties into Agent Evaluation issue)
- Dashboard: A/B experiment creation, live results comparison, winner selection
- Statistical significance calculation
Acceptance Criteria
- Configure multiple model variants with traffic split
- Traffic correctly distributed per weights
- Per-variant metrics tracked and visible in dashboard
- Ability to declare winner and apply to 100% traffic
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P2Medium priorityMedium priorityanalyticsAnalytics and metricsAnalytics and metricsenhancementNew feature or requestNew feature or request