Skip to content

Conversation

@ihower
Copy link
Contributor

@ihower ihower commented Dec 20, 2025

Resolves: #2163

This PR fixes an issue where non-text tool outputs (such as ToolOutputImage) are dropped when using the LiteLLM and chatcmpl adapter.

Previously, in the tool output conversion logic, the adapter used:

extract_text_content(output_content)

which only keeps text and drops image, audio, and file inputs.

This PR switches it to:

extract_all_content(output_content)

so the following types are preserved:

  • input_text
  • input_image
  • input_audio
  • input_file

Backwards compatibility

I think this change should be mostly safe and backwards compatible

  • If the tool returns a str, behavior is unchanged.
  • If the tool returns a list or dict with only type="text", behavior is unchanged.

If the tool returns images or files:

  • Some model providers actually support and can consume them (e.g. Claude, Azure?).
  • Some model providers ignore them and only process the text part (e.g. OpenAI Chat Completions, Gemini, DeepSeek). This is the same result as the previous behavior where non-text parts were removed.

Example Code

The following example will work after this PR using Claude model.

from agents import ToolOutputImage, ToolOutputText, function_tool, Agent, Runner
from agents.extensions.models.litellm_model import LitellmModel
from typing import Union

import json
import base64
import litellm

@function_tool
async def retrieve_test_image() -> Union[ToolOutputImage, str]:
    path = "a_test_image.jpg"
    b64 = base64.b64encode(open(path, "rb").read()).decode()
    return [
        ToolOutputText(text="hello"),
        ToolOutputImage(image_url=f"data:image/jpeg;base64,{b64}", detail="high")
    ]

test_agent = Agent(
    name="Image Test Agent",
    instructions="You retrieve and describe images.",
    model=LitellmModel(model="anthropic/claude-sonnet-4-5-20250929"),
    tools=[retrieve_test_image],
)

result = Runner.run_sync(
    test_agent,
    "call retrieve_test_image. What do you see in this image and text?"
)
print(result.final_output)

# In this image, I can see....
# The function also returned the text output "hello".

Implementation Notes

To make this change type-safe, I introduced ExtendedChatCompletionToolMessageParam with a broader content type that matches the return value of extract_all_content. Without this, switching to extract_all_content directly results in a mypy error:

src/agents/models/chatcmpl_converter.py:547: error: Incompatible types (expression has type
"str | list[ChatCompletionContentPartTextParam | ChatCompletionContentPartImageParam | ChatCompletionContentPartInputAudioParam | File]",
TypedDict item "content" has type "str | Iterable[ChatCompletionContentPartTextParam]")

If this feels too heavy-weight, an alternative is to keep the existing type and ignore the typing error explicitly:

"content": cls.extract_all_content(output_content),  # type: ignore[typeddict-item]

I can switch to this approach instead if it’s preferred.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +556 to +559
msg: ExtendedChatCompletionToolMessageParam = {
"role": "tool",
"tool_call_id": func_output["call_id"],
"content": cls.extract_text_content(output_content),
"content": cls.extract_all_content(output_content),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid sending non-text tool content to OpenAI Chat Completions

Tool outputs are now passed through extract_all_content, which includes image/audio/file parts, directly into the tool message payload (lines 556-559). The OpenAI ChatCompletions API only accepts text for tool messages (ChatCompletionToolMessageParam is limited to str or text parts), so when a tool returns a ToolOutputImage, input_audio, or file result and this converter is used by ChatCompletionsModel, the request will be rejected with an invalid payload instead of gracefully ignoring the non-text content as before. This is a regression for any OpenAI/Azure chat-completions call whose tools emit media output.

Useful? React with 👍 / 👎.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot accept the code that could potentially break existing OpenAI Chat Completions API code.

Copy link
Contributor Author

@ihower ihower Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this with the OpenAI chat completions endpoint. In practice, the server ignores unsupported non-text tool content instead of returning an API error, so this does not seem to break the API.

Before this change, non-text tool outputs were silently dropped by the SDK. Now they are preserved. If a provider ignores them, the behavior is the same. If a provider returns an error, that is actually better because it clearly shows that the provider does not support media in tool outputs.

For this reason, I don’t think the SDK needs to filter this content on behalf of developers. This flexibility can be left to developers, since some providers do support images in tool outputs.

FYI, before v0.3.3 the SDK did not filter this content either. The filtering was introduced later during the openai-python upgrade, and it seems to have been added mainly to satisfy type checking rather than due to a strict API requirement.

Copy link
Member

@seratch seratch Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converter is primarily used for OpenAI's Chat Completions API model, so even if the server endpoint ignores the data pattern, making the data compatible with underlying OpenAI model interface is still important. Also, the server behavior could be changed in the future because the current behavior where the endpoint ignores the unsupported data structure is not clearly mentioned in the public documents.

For the benefit of LiteLLM users, I think enabling the callers of this converter to customize the behavior here by adding overload methods, which accept customization option may be a good approach.

@seratch seratch added enhancement New feature or request feature:lite-llm labels Dec 22, 2025
@seratch seratch added this to the 0.7.x milestone Dec 22, 2025
@seratch seratch marked this pull request as draft December 22, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature:lite-llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ToolOutputImage not recognized when using LiteLLM (Azure provider) with Agents SDK

2 participants