Skip to content

Conversation

@ichbinlucaskim
Copy link

Summary

This PR implements a short‑term, fail‑fast guardrail for issue #7132 and adds a dedicated reproduction sample plus regression tests.

When AssistantAgent is used with an OpenRouter (OpenAI‑compatible) model and output_content_type is a Pydantic model, the OpenAIChatCompletionClient currently routes requests through beta.chat.completions.parse(response_format=...). In this structured‑output mode, the OpenAI API does not support tool calling, so requests that include both response_format and tools result in tool calls being silently ignored.

This PR makes that incompatibility explicit and prevents the “silent tool drop” behavior.


Changes

1. Fail‑fast guardrail in _process_create_args

File: python/packages/autogen-ext/src/autogen_ext/models/openai/_openai_client.py
Method: _process_create_args

Right after converted_tools = convert_tools(tools), we now check for the incompatible combination of structured output and tools:

# Guardrail: structured output (Pydantic model) cannot be combined with tool calling.
# TODO: long-term, this could be a dedicated configuration error type (e.g. IncompatibleModelConfigurationError).
if response_format_value is not None and len(converted_tools) > 0:
    raise ValueError(
        "Cannot use structured output (output_content_type) together with function tools. "
        "The OpenAI structured output API does not support tool calling in this mode. "
        "Either remove output_content_type or remove tools."
    )

Behavior:

  • Triggers only when:
    • response_format_value is set (Pydantic‑based structured output is enabled), and
    • converted_tools is non‑empty (at least one function tool is present).
  • In that case, _process_create_args raises ValueError with a descriptive message.
  • No other behavior in _process_create_args is changed:
    • Structured output without tools still uses the beta parse path.
    • Tools without structured output still use the regular chat completions path.
  • No fallbacks or retries are introduced. Invalid configurations fail fast instead of silently dropping tools.

2. Regression tests for the guardrail

File: python/packages/autogen-ext/tests/models/test_openai_model_client.py

Added a minimal Pydantic model used only for these tests:

class Weather(BaseModel):
    """Minimal Pydantic model for structured-output guardrail tests (issue #7132)."""
    city: str = Field(description="City name.")

Added a minimal tool function:

def _dummy_tool_for_guardrail(city: str) -> str:
    """Minimal tool for testing structured-output vs tools guardrail."""
    return f"Weather in {city}"

Test 1: Pydantic json_output + tools → ValueError

  • test_structured_output_with_tools_raises_value_error
    • Creates a BaseOpenAIChatCompletionClient with a MagicMock underlying client and model_info.structured_output=True.
    • Passes:
      • messages=[UserMessage(...)]
      • tools=[FunctionTool.from_function(_dummy_tool_for_guardrail)]
      • json_output=Weather
    • Calls client._process_create_args(...).
    • Asserts that:
      • A ValueError is raised.
      • The error message contains "Cannot use structured output (output_content_type) together with function tools".

Test 2: Pydantic json_output + no tools → passes

  • test_structured_output_without_tools_passes
    • Same client setup and Weather model.
    • Passes tools=[].
    • Calls client._process_create_args(...).
    • Asserts that:
      • No exception is raised.
      • The returned create_params has response_format is Weather.
      • len(create_params.tools) == 0.

Both tests use a mocked underlying client and only exercise _process_create_args; they do not perform any real network calls.

Note: The existing integration tests
test_openai_structured_output_with_tool_calls and
test_openai_structured_output_with_streaming_tool_calls
now fail with this ValueError, which is the intended behavior under the new guardrail (they exercise the disallowed “tools + structured output in a single request” configuration).


3. Reproduction sample

File: python/samples/agentchat_openrouter/assistant_openrouter_output_content_type.py

Sample that reproduces issue #7132 using AssistantAgent with an OpenRouter model:

  • Configures a tool (e.g., get_weather).
  • Sets output_content_type to a Pydantic model.
  • Before this PR: tools were silently ignored and the agent returned a plain text response.
  • After this PR: the call now fails fast with the ValueError described above.

Structural / Long‑Term Note

This PR intentionally does not attempt to make “tools + structured output in one request” work, because it is not compatible with the current OpenAI API contract.

  • beta.chat.completions.parse(response_format=...) is designed for returning structured JSON that matches the schema in a single step.
  • Tool calling, on the other hand, is inherently a multi‑step protocol where the model first emits a tool_call, external code runs, and then the model is called again with tool results.

Trying to combine both in a single request is structurally incompatible with the current API behavior.

A more robust long‑term solution would be to change the agent workflow to use a two‑step protocol:

  1. First call: regular chat completion with tools enabled (no response_format) to execute tools and collect their outputs.
  2. Second call: structured‑output completion with response_format set and tools disabled, to turn tool outputs into a typed response.

This PR is a short‑term, fail‑fast guardrail that makes the incompatibility explicit; it does not change the higher‑level agent workflow.


Testing

Local runs:

  • uv run pytest packages/autogen-ext/tests/models/test_openai_model_client.py
    • 69 passed, 17 skipped.
    • Remaining 2 failures are the existing structured‑output + tools integration tests, which now hit the new ValueError as expected under the guardrail.

The new unit tests:

  • test_structured_output_with_tools_raises_value_error
  • test_structured_output_without_tools_passes

both pass.

…Issue microsoft#7132)

- Add guardrail in OpenAIChatCompletionClient to fail fast when tools are combined with Pydantic structured outputs
- Add regression tests plus an OpenRouter repro sample for issue microsoft#7132.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant