Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| start_time, | ||
| _now_iso(), | ||
| result={"reward": result.reward}, | ||
| ) |
There was a problem hiding this comment.
Telemetry failure inside try block corrupts evaluation state
Medium Severity
The _emit_scenario_span "completed" call is inside the try block in both _run_task_scenario_evaluate and _run_task_scenario_setup. If span emission raises (e.g., from _normalize_trace_id or queue_span), the except handler fires despite the operation having succeeded. In _run_task_scenario_evaluate, this incorrectly sets self.error to a telemetry error even though self.evaluation_result and self.reward are already correctly assigned, creating inconsistent state. In _run_task_scenario_setup, it re-raises a telemetry error as if setup failed, even though self.prompt was already set.


Note
Medium Risk
Adds new telemetry emission during scenario setup/evaluate and changes error handling flow in
_run_task_scenario_evaluate, which could affect when results/errors are recorded and how they appear in traces.Overview
Adds explicit telemetry spans for scenario lifecycle events to improve real-time visibility of
setupandevaluatestages.EvalContextnow emitsscenario_setup/scenario_evaluatespans on start, completion, and error (including timestamps, normalized trace IDs, and optional reward/result metadata) viaqueue_span. Scenario setup now re-raises exceptions after emitting an error span, and scenario evaluate recordsevaluation_result/rewardbefore emitting a completion span, while emitting an error span when evaluation fails.Written by Cursor Bugbot for commit 0983cd5. This will update automatically on new commits. Configure here.