Skip to content

Conversation

@m7md7sien
Copy link
Contributor

Description

Handle flow dictionary direct output in evaluators: The case where flow returns <actual_output> directly instead of {"llm_output": <actual_output>}

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@m7md7sien m7md7sien requested a review from a team as a code owner February 12, 2026 16:14
Copilot AI review requested due to automatic review settings February 12, 2026 16:14
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Feb 12, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates multiple prompt-based evaluators to handle the case where a prompty flow returns the evaluation payload dict directly (instead of wrapping it under {"llm_output": ...}), improving evaluator robustness across different flow output shapes.

Changes:

  • Update evaluators to fall back to treating the full flow result as llm_output when the llm_output key is missing.
  • Fix _ToolSelectionEvaluator to respect the threshold passed into __init__ (instead of hardcoding 1).
  • Improve a few evaluator-specific behaviors/documentation (e.g., token-metadata handling in response completeness, input validation in similarity, and a docstring example fix).

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_selection/_tool_selection.py Use provided threshold and support direct-dict flow outputs for llm_output.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_output_utilization/_tool_output_utilization.py Support direct-dict flow outputs for llm_output.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_input_accuracy/_tool_input_accuracy.py Support direct-dict flow outputs for llm_output.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_success/_tool_call_success.py Support direct-dict flow outputs for llm_output.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_tool_call_accuracy/_tool_call_accuracy.py Support direct-dict flow outputs for llm_output.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_navigation_efficiency/_task_navigation_efficiency.py Fix a docstring example list quoting issue.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_completion/_task_completion.py Support direct-dict flow outputs; adjust tool_definitions formatting condition.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_task_adherence/_task_adherence.py Support direct-dict flow outputs; include threshold in returned properties.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_similarity/_similarity.py Add explicit required-input validation via EvaluationException.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_response_completeness/_response_completeness.py Support direct-dict flow outputs; make token-metadata extraction safe when result isn’t a dict.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_relevance/_relevance.py Support direct-dict flow outputs for llm_output.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_intent_resolution/_intent_resolution.py Support direct-dict flow outputs for llm_output.
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py Align AsyncPrompty import with env-var switch used elsewhere (promptflow vs legacy).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant