Skip to content

Emit scenario spans#346

Draft
farrelmahaztra wants to merge 3 commits intomainfrom
hud-761
Draft

Emit scenario spans#346
farrelmahaztra wants to merge 3 commits intomainfrom
hud-761

Conversation

@farrelmahaztra
Copy link
Contributor

@farrelmahaztra farrelmahaztra commented Feb 24, 2026

Note

Medium Risk
Adds new telemetry emission during scenario setup/evaluate and changes error handling flow in _run_task_scenario_evaluate, which could affect when results/errors are recorded and how they appear in traces.

Overview
Adds explicit telemetry spans for scenario lifecycle events to improve real-time visibility of setup and evaluate stages.

EvalContext now emits scenario_setup/scenario_evaluate spans on start, completion, and error (including timestamps, normalized trace IDs, and optional reward/result metadata) via queue_span. Scenario setup now re-raises exceptions after emitting an error span, and scenario evaluate records evaluation_result/reward before emitting a completion span, while emitting an error span when evaluation fails.

Written by Cursor Bugbot for commit 0983cd5. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

start_time,
_now_iso(),
result={"reward": result.reward},
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry failure inside try block corrupts evaluation state

Medium Severity

The _emit_scenario_span "completed" call is inside the try block in both _run_task_scenario_evaluate and _run_task_scenario_setup. If span emission raises (e.g., from _normalize_trace_id or queue_span), the except handler fires despite the operation having succeeded. In _run_task_scenario_evaluate, this incorrectly sets self.error to a telemetry error even though self.evaluation_result and self.reward are already correctly assigned, creating inconsistent state. In _run_task_scenario_setup, it re-raises a telemetry error as if setup failed, even though self.prompt was already set.

Additional Locations (1)

Fix in Cursor Fix in Web

@lorenss-m lorenss-m marked this pull request as draft February 24, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant