-
Notifications
You must be signed in to change notification settings - Fork 3k
Preserve non-text tool outputs in LiteLLM and chatcmpl converters #2214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| msg: ExtendedChatCompletionToolMessageParam = { | ||
| "role": "tool", | ||
| "tool_call_id": func_output["call_id"], | ||
| "content": cls.extract_text_content(output_content), | ||
| "content": cls.extract_all_content(output_content), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid sending non-text tool content to OpenAI Chat Completions
Tool outputs are now passed through extract_all_content, which includes image/audio/file parts, directly into the tool message payload (lines 556-559). The OpenAI ChatCompletions API only accepts text for tool messages (ChatCompletionToolMessageParam is limited to str or text parts), so when a tool returns a ToolOutputImage, input_audio, or file result and this converter is used by ChatCompletionsModel, the request will be rejected with an invalid payload instead of gracefully ignoring the non-text content as before. This is a regression for any OpenAI/Azure chat-completions call whose tools emit media output.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot accept the code that could potentially break existing OpenAI Chat Completions API code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested this with the OpenAI chat completions endpoint. In practice, the server ignores unsupported non-text tool content instead of returning an API error, so this does not seem to break the API.
Before this change, non-text tool outputs were silently dropped by the SDK. Now they are preserved. If a provider ignores them, the behavior is the same. If a provider returns an error, that is actually better because it clearly shows that the provider does not support media in tool outputs.
For this reason, I don’t think the SDK needs to filter this content on behalf of developers. This flexibility can be left to developers, since some providers do support images in tool outputs.
FYI, before v0.3.3 the SDK did not filter this content either. The filtering was introduced later during the openai-python upgrade, and it seems to have been added mainly to satisfy type checking rather than due to a strict API requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This converter is primarily used for OpenAI's Chat Completions API model, so even if the server endpoint ignores the data pattern, making the data compatible with underlying OpenAI model interface is still important. Also, the server behavior could be changed in the future because the current behavior where the endpoint ignores the unsupported data structure is not clearly mentioned in the public documents.
For the benefit of LiteLLM users, I think enabling the callers of this converter to customize the behavior here by adding overload methods, which accept customization option may be a good approach.
Resolves: #2163
This PR fixes an issue where non-text tool outputs (such as
ToolOutputImage) are dropped when using the LiteLLM and chatcmpl adapter.Previously, in the tool output conversion logic, the adapter used:
which only keeps text and drops image, audio, and file inputs.
This PR switches it to:
so the following types are preserved:
input_textinput_imageinput_audioinput_fileBackwards compatibility
I think this change should be mostly safe and backwards compatible
str, behavior is unchanged.If the tool returns images or files:
Example Code
The following example will work after this PR using Claude model.
Implementation Notes
To make this change type-safe, I introduced
ExtendedChatCompletionToolMessageParamwith a broadercontenttype that matches the return value ofextract_all_content. Without this, switching toextract_all_contentdirectly results in a mypy error:If this feels too heavy-weight, an alternative is to keep the existing type and ignore the typing error explicitly:
I can switch to this approach instead if it’s preferred.