feat: Adding tool call into response api#413
Conversation
Summary of ChangesHello @ilblackdragon, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly extends the API's capabilities by integrating support for external function calls. It allows the language model to dynamically request the execution of client-defined functions, effectively pausing the conversation until the client provides the necessary results. This mechanism facilitates a more interactive and extensible tool-use experience, enabling the LLM to leverage external services and data sources managed by the client application. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for external function calls (tool calls) in the response API, a significant and well-implemented feature. The implementation of process_function_call_outputs contains an authorization bypass and an IDOR vulnerability. This allows tool call IDs verification to be skipped with a malformed previous_response_id, leading to unauthorized injection of tool results and potential information leaks due to loading response items without ownership verification. Additionally, a high-severity issue was identified regarding inconsistent tool_call_id generation, which could disrupt the tool call flow if an ID is not provided by the LLM. There are also a couple of medium-severity issues to improve robustness and test clarity. Addressing these critical security and functional issues is essential before merging.
| // The service layer will handle this error and create the FunctionCall output item | ||
| Err(ResponseError::FunctionCallRequired { | ||
| name: tool_call.tool_type.clone(), | ||
| call_id: tool_call.id.clone().unwrap_or_default(), |
There was a problem hiding this comment.
There is an inconsistency in how fallback tool_call_ids are generated. Here, if tool_call.id is None, it defaults to an empty string. However, in service.rs, when constructing the assistant message with tool_calls, a new UUID-based ID is generated as a fallback.
This will cause a mismatch. The FunctionCall item sent to the client will have an empty call_id, but the assistant message in the history will have a generated ID. When the client responds with the function output, the resulting tool message will have an empty tool_call_id, which won't match the one in the assistant message, likely causing an error from the LLM provider.
The ID should be generated once and reused. A potential fix is to populate the ID on ToolCallInfo immediately after it's created if it's missing.
| if let Ok(uuid) = uuid::Uuid::parse_str(uuid_str) { | ||
| let response_id = models::ResponseId(uuid); | ||
|
|
||
| // List all items from the previous response | ||
| let items = response_items_repository | ||
| .list_by_response(response_id) | ||
| .await | ||
| .map_err(|e| { | ||
| errors::ResponseError::InternalError(format!( | ||
| "Failed to fetch response items: {e}" | ||
| )) | ||
| })?; | ||
|
|
||
| // Find the matching FunctionCall item | ||
| let found = items.iter().any(|item| { | ||
| matches!(item, models::ResponseOutputItem::FunctionCall { | ||
| call_id: item_call_id, | ||
| .. | ||
| } if item_call_id == call_id) | ||
| }); | ||
|
|
||
| if !found { | ||
| return Err(errors::ResponseError::FunctionCallNotFound( | ||
| call_id.to_string(), | ||
| )); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The verification logic for call_id can be bypassed by providing a malformed previous_response_id. If uuid::Uuid::parse_str(uuid_str) fails, the entire verification block is skipped, and the function proceeds to add the unverified tool output to the conversation history. This allows an attacker to inject arbitrary tool results into the LLM context, potentially leading to indirect prompt injection.
| .list_by_response(response_id) | ||
| .await | ||
| .map_err(|e| { |
There was a problem hiding this comment.
This call to list_by_response uses a response_id derived from user input without verifying that the response belongs to the current workspace or user. While the items are not directly returned to the user, they are loaded into memory, and the attacker can use the FunctionCallNotFound error as a side-channel to verify the existence of specific call_ids in other users' responses. The repository method list_by_response should be updated to enforce ownership checks.
| let turn2_with_tool_result_prompt = | ||
| mock_prompts::build_prompt(&format!("{} {}", turn1_user, function_output)); |
There was a problem hiding this comment.
The mocking for the second turn of the conversation seems to not accurately represent the conversation history. The prompt for the LLM should be a structured list of messages, including the user's initial query, the assistant's first response with the tool call, and a new message with role: "tool" containing the function output. Simply concatenating the user message and the function output into a single string doesn't reflect the actual API call to the LLM provider.
While this test may pass due to how mock_prompts::build_prompt is implemented, it makes the test brittle and harder to understand. The mock should be updated to reflect the actual message structure. This feedback also applies to test_function_tool_parallel_calls.
|
|
||
| if function_outputs.is_empty() { | ||
| return Ok(messages); | ||
| } |
There was a problem hiding this comment.
For robustness, it's better to explicitly validate that FunctionCallOutput items are only provided when resuming a response. If function_outputs is not empty, request.previous_response_id must be present. Without this check, the request might proceed and fail later at the LLM provider level with a less clear error message.
Please add a check after this block:
if request.previous_response_id.is_none() {
return Err(errors::ResponseError::InvalidParams(
"FunctionCallOutput is only allowed when resuming a response with previous_response_id.".to_string(),
));
}This also makes the behavior consistent with the test test_function_output_without_previous_response_fails, which expects a client error in this scenario.
Custom function tools were failing with "missing query field" error because convert_tool_calls only allowed specific tools to skip the 'query' requirement. This also affected code_interpreter and computer tools which had no executor. Changes: - Update get_function_tool_names() to include code_interpreter/computer - Update FunctionToolExecutor to handle code_interpreter/computer - Add function_tool_names parameter to convert_tool_calls - Add tests for all new functionality Now all tool types are properly handled: - Function: client-executed via FunctionToolExecutor - WebSearch: server-executed via WebSearchToolExecutor - FileSearch: server-executed via FileSearchToolExecutor - CodeInterpreter: client-executed via FunctionToolExecutor - Computer: client-executed via FunctionToolExecutor - Mcp: server-executed via McpToolExecutor Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
No description provided.