JSON Evaluator: Multi-field comparison with per-field scoring

## Summary

Improve the JSON evaluator to compare full JSON objects (expected answer vs LLM output) and provide **per-field match scores** instead of requiring users to create separate evaluators for each field.

## Problem Statement

Current limitations with JSON evaluators:

1. **Field Match evaluator** compares a single field in LLM output to the *entire* ground truth column (not a field within it)
2. Users must create **one evaluator per field** they want to validate
3. No visibility into which specific fields passed/failed - only aggregate scores

## Proposed Solution (Checkpoint 1)

Modify the JSON evaluator to:

1. Accept the **full expected answer column as JSON** (not just a single value)
2. Compare **each field** in the expected JSON against the corresponding field in the LLM output
3. Return a **score breakdown per field** (e.g., `{"name": 1.0, "email": 1.0, "phone": 0.0}`)
4. Calculate an **aggregate score** (average of field scores)

## Success Criteria

- [ ] User can configure a single evaluator that validates multiple JSON fields
- [ ] Evaluation results show per-field pass/fail status
- [ ] Aggregate score reflects percentage of matching fields
- [ ] Works with nested JSON (at least 1 level deep)

## Future Checkpoints (Out of Scope)

* Checkpoint 2: Field-to-field mapping UI (when output keys ≠ expected keys)
* Checkpoint 3: Per-field match type configuration (exact, semantic, numeric tolerance)
* Checkpoint 4: Evaluator playground for testing configurations

## Technical Notes

Current implementation is in:

* Backend: `api/oss/src/core/evaluators/utils.py` (functions: `field_match_test`, `compare_jsons`)
* Config: `api/oss/src/resources/evaluators/evaluators.py`
* Frontend: `web/oss/src/components/Evaluators/`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JSON Evaluator: Multi-field comparison with per-field scoring #3293

Summary

Problem Statement

Proposed Solution (Checkpoint 1)

Success Criteria

Future Checkpoints (Out of Scope)

Technical Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

JSON Evaluator: Multi-field comparison with per-field scoring #3293

Description

Summary

Problem Statement

Proposed Solution (Checkpoint 1)

Success Criteria

Future Checkpoints (Out of Scope)

Technical Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions