Skip to content

Conversation

@Yash3561
Copy link

Why are these changes needed?

Current LLMCallEvent and LLMStreamEndEvent capture token counts but lack the temporal telemetry required for production-grade agent monitoring. To optimize agentic workflows and enforce Service Level Agreements (SLAs), developers need to measure:

  1. TTFT (Time To First Token): Critical for evaluating user experience in streaming agents.
  2. TPS (Tokens Per Second): Essential for benchmarking throughput across different inference providers.
  3. End-to-End Latency: Required to identify bottlenecks in complex multi-agent orchestration loops.

This PR implements high-precision timing using time.perf_counter() to provide these metrics without breaking backward compatibility.

Related issue number

Closes #5790

Checks

  • I've included any doc changes needed for https://microsoft.github.io/autogen/. (No public documentation changes required for this internal telemetry upgrade).
  • I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • I've made sure all auto checks have passed (Ran black and ruff linting).

Technical Details

  • Event Refactor: Added latency_ms, tokens_per_second, and ttft_ms optional fields to LLMCallEvent and LLMStreamEndEvent in logging.py.
  • Client Integration: Integrated timing logic into OpenAIChatCompletionClient and AzureAIChatCompletionClient.
  • Streaming Logic: Captured ttft_ms by measuring the interval between request initiation and the first yielded chunk containing content or tool calls.
  • Precision: Utilized time.perf_counter() to ensure monotonic, high-resolution measurements immune to system clock adjustments.
  • Testing: Implemented regression tests in python/packages/autogen-core/tests/test_logging_events.py verifying both data presence and safe handling of optional fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add latency and token per second stats to LLMCallEvent.

1 participant