Emit scenario spans by farrelmahaztra · Pull Request #346 · hud-evals/hud-python

farrelmahaztra · 2026-02-24T18:43:15Z

Note

Medium Risk
Adds new telemetry emission during scenario setup/evaluate and changes error handling flow in _run_task_scenario_evaluate, which could affect when results/errors are recorded and how they appear in traces.

Overview
Adds explicit telemetry spans for scenario lifecycle events to improve real-time visibility of setup and evaluate stages.

EvalContext now emits scenario_setup/scenario_evaluate spans on start, completion, and error (including timestamps, normalized trace IDs, and optional reward/result metadata) via queue_span. Scenario setup now re-raises exceptions after emitting an error span, and scenario evaluate records evaluation_result/reward before emitting a completion span, while emitting an error span when evaluation fails.

^{Written by Cursor Bugbot for commit 0983cd5. This will update automatically on new commits. Configure here.}

hud/eval/context.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-02-24T19:14:32Z

hud/eval/context.py

+                start_time,
+                _now_iso(),
+                result={"reward": result.reward},
+            )


Telemetry failure inside try block corrupts evaluation state

Medium Severity

The _emit_scenario_span "completed" call is inside the try block in both _run_task_scenario_evaluate and _run_task_scenario_setup. If span emission raises (e.g., from _normalize_trace_id or queue_span), the except handler fires despite the operation having succeeded. In _run_task_scenario_evaluate, this incorrectly sets self.error to a telemetry error even though self.evaluation_result and self.reward are already correctly assigned, creating inconsistent state. In _run_task_scenario_setup, it re-raises a telemetry error as if setup failed, even though self.prompt was already set.

Additional Locations (1)

hud/eval/context.py#L446-L453

Emit scenario spans

5f13d24

farrelmahaztra requested review from jdchawla29 and lorenss-m February 24, 2026 18:43

cursor bot reviewed Feb 24, 2026

View reviewed changes

hud/eval/context.py Outdated Show resolved Hide resolved

Fix exception handling

281b78c

cursor bot reviewed Feb 24, 2026

View reviewed changes

hud/eval/context.py Show resolved Hide resolved

Skip if trace disabled

0983cd5

cursor bot reviewed Feb 24, 2026

View reviewed changes

lorenss-m marked this pull request as draft February 24, 2026 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit scenario spans#346

Emit scenario spans#346
farrelmahaztra wants to merge 3 commits intomainfrom
hud-761

farrelmahaztra commented Feb 24, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

farrelmahaztra commented Feb 24, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 24, 2026

Choose a reason for hiding this comment

Telemetry failure inside try block corrupts evaluation state

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

farrelmahaztra commented Feb 24, 2026 •

edited by cursor bot

Loading