Skip to content

JSON Evaluator: Multi-field comparison with per-field scoring #3293

@mmabrouk

Description

@mmabrouk

Summary

Improve the JSON evaluator to compare full JSON objects (expected answer vs LLM output) and provide per-field match scores instead of requiring users to create separate evaluators for each field.

Problem Statement

Current limitations with JSON evaluators:

  1. Field Match evaluator compares a single field in LLM output to the entire ground truth column (not a field within it)
  2. Users must create one evaluator per field they want to validate
  3. No visibility into which specific fields passed/failed - only aggregate scores

Proposed Solution (Checkpoint 1)

Modify the JSON evaluator to:

  1. Accept the full expected answer column as JSON (not just a single value)
  2. Compare each field in the expected JSON against the corresponding field in the LLM output
  3. Return a score breakdown per field (e.g., {"name": 1.0, "email": 1.0, "phone": 0.0})
  4. Calculate an aggregate score (average of field scores)

Success Criteria

  • User can configure a single evaluator that validates multiple JSON fields
  • Evaluation results show per-field pass/fail status
  • Aggregate score reflects percentage of matching fields
  • Works with nested JSON (at least 1 level deep)

Future Checkpoints (Out of Scope)

  • Checkpoint 2: Field-to-field mapping UI (when output keys ≠ expected keys)
  • Checkpoint 3: Per-field match type configuration (exact, semantic, numeric tolerance)
  • Checkpoint 4: Evaluator playground for testing configurations

Technical Notes

Current implementation is in:

  • Backend: api/oss/src/core/evaluators/utils.py (functions: field_match_test, compare_jsons)
  • Config: api/oss/src/resources/evaluators/evaluators.py
  • Frontend: web/oss/src/components/Evaluators/

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions