You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should add the ability to run test cases multiple times in the playground. This would let users check model consistency directly from the UI.
Problem
Users currently can't test how consistent their prompts are across multiple runs. They need to manually run the same test case several times or write code to do it programmatically.
Proposed solution
Add a repetition setting in the playground configuration. Users would click the configuration icon and specify how many times each test case should run.
Implementation details
The configuration could work like this:
Add a "Repetitions" field in the playground settings panel
Default value: 1 (current behavior)
When set to N > 1, run each test case N times
Display all N outputs for comparison
Use cases
This helps users:
Test prompt stability across multiple runs
Identify prompts or models that produce inconsistent results
Make informed decisions about temperature and sampling settings
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Body:
We should add the ability to run test cases multiple times in the playground. This would let users check model consistency directly from the UI.
Problem
Users currently can't test how consistent their prompts are across multiple runs. They need to manually run the same test case several times or write code to do it programmatically.
Proposed solution
Add a repetition setting in the playground configuration. Users would click the configuration icon and specify how many times each test case should run.
Implementation details
The configuration could work like this:
Use cases
This helps users:
Beta Was this translation helpful? Give feedback.
All reactions