ChatOpenAI
这个笔记本提供了一个快速入门指南,用于开始使用 OpenAI 聊天模型。要详细了解所有 ChatOpenAI 的功能和配置,请参阅 API 参考。
OpenAI 有几种聊天模型。您可以在 OpenAI 文档 中找到有关它们最新版本、费用、上下文窗口大小以及支持的输入类型的信息。
注意,某些OpenAI模型也可以通过Microsoft Azure平台访问。要使用Azure OpenAI服务,请使用AzureChatOpenAI集成。
概览
集成细节
| Class | 包 | 本地 | 序列化 | JS支持 | Package downloads | Package 最新版本 |
|---|---|---|---|---|---|---|
| ChatOpenAI | langchain-openai | ❌ | beta | ✅ |
模型特性
| 工具调用 | 结构化输出 | JSON 模式 | 图像输入 | 音频输入 | 视频输入 | Token级流式传输 | 原生异步 | Token 使用 | 对数概率 |
|---|---|---|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ |
设置
要访问OpenAI模型,您需要创建一个OpenAI账户,获取API密钥,并安装langchain-openai集成包。
Credentials
前往 https://platform.openai.com 注册 OpenAI 并生成一个 API 密钥。完成此步骤后,请设置 OPENAI_API_KEY 环境变量:
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
如果您希望自动跟踪模型调用,也可以通过取消注释下方的代码来设置您的LangSmith API密钥:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
安装
The LangChain OpenAI 整合模块位于 langchain-openai 包中:
%pip install -qU langchain-openai
Instantiation
现在我们就可以实例化我们的模型对象并生成聊天完成内容:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o",
temperature=0,
max_tokens=None,
timeout=None,
max_retries=2,
# api_key="...", # if you prefer to pass api key in directly instaed of using env vars
# base_url="...",
# organization="...",
# other params...
)
Invocation
messages = [
(
"system",
"You are a helpful assistant that translates English to French. Translate the user sentence.",
),
("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 31, 'total_tokens': 36}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'stop', 'logprobs': None}, id='run-63219b22-03e3-4561-8cc4-78b7c7c3a3ca-0', usage_metadata={'input_tokens': 31, 'output_tokens': 5, 'total_tokens': 36})
print(ai_msg.content)
J'adore la programmation.
链式调用
我们可以通过以下方式将模型与提示模板进行链接:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)
chain = prompt | llm
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)
AIMessage(content='Ich liebe das Programmieren.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 26, 'total_tokens': 32}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'stop', 'logprobs': None}, id='run-350585e1-16ca-4dad-9460-3d9e7e49aaf1-0', usage_metadata={'input_tokens': 26, 'output_tokens': 6, 'total_tokens': 32})
工具调用
OpenAI 有一个 称为工具调用的 API(我们在这里交替使用“工具调用”和“函数调用”),该 API 允许你描述工具及其参数,并使模型返回一个包含要调用的工具及其输入的 JSON 对象。工具调用对于构建工具使用的链和代理非常重要,也更广泛地有助于从模型获取结构化输出。
ChatOpenAI.bind_tools()
使用 ChatOpenAI.bind_tools,我们可以轻松地将 Pydantic 类、字典模式、LangChain 工具或甚至作为工具的函数传递给模型。在幕后,这些会被转换为一个 OpenAI 工具模式,看起来像这样:
{
"name": "...",
"description": "...",
"parameters": {...} # JSONSchema
}
并且在每次模型调用时传入。
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
"""Get the current weather in a given location"""
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
llm_with_tools = llm.bind_tools([GetWeather])
ai_msg = llm_with_tools.invoke(
"what is the weather like in San Francisco",
)
ai_msg
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_o9udf3EVOWiV4Iupktpbpofk', 'function': {'arguments': '{"location":"San Francisco, CA"}', 'name': 'GetWeather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 68, 'total_tokens': 85}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-1617c9b2-dda5-4120-996b-0333ed5992e2-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_o9udf3EVOWiV4Iupktpbpofk', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})
strict=True
langchain-openai>=0.1.21截至2024年8月6日,OpenAI 在调用工具时支持使用 strict 作为参数来确保模型遵循工具的参数方案。更多信息请参阅:https://platform.openai.com/docs/guides/function-calling
Note: 如果为strict=True,工具定义也将被验证,并且接受部分JSON模式。关键之处在于模式中不能有可选参数(那些具有默认值的参数)。请阅读完整文档以了解支持哪些类型的模式:这里是相关说明:https://platform.openai.com/docs/guides/structured-outputs/supported-schemas。
llm_with_tools = llm.bind_tools([GetWeather], strict=True)
ai_msg = llm_with_tools.invoke(
"what is the weather like in San Francisco",
)
ai_msg
AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_jUqhd8wzAIzInTJl72Rla8ht', 'function': {'arguments': '{"location":"San Francisco, CA"}', 'name': 'GetWeather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 68, 'total_tokens': 85}, 'model_name': 'gpt-4o-2024-05-13', 'system_fingerprint': 'fp_3aa7262c27', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-5e3356a9-132d-4623-8e73-dd5a898cf4a6-0', tool_calls=[{'name': 'GetWeather', 'args': {'location': 'San Francisco, CA'}, 'id': 'call_jUqhd8wzAIzInTJl72Rla8ht', 'type': 'tool_call'}], usage_metadata={'input_tokens': 68, 'output_tokens': 17, 'total_tokens': 85})
AIMessage.tool_calls
注意,AIMessage 具有一个 tool_calls 属性。该属性以标准化的 ToolCall 格式存在,并且对模型提供商是通用的。
ai_msg.tool_calls
[{'name': 'GetWeather',
'args': {'location': 'San Francisco, CA'},
'id': 'call_jUqhd8wzAIzInTJl72Rla8ht',
'type': 'tool_call'}]
更多关于绑定工具和工具调用输出的内容,请参阅工具调用文档。
Responses API
langchain-openai>=0.3.9OpenAI 支持一个面向构建 智能代理应用 的 Responses API。它包括一系列内置工具,例如网页和文件搜索。此外,还支持管理 对话状态,允许您在无需显式传递之前消息的情况下继续进行对话线程,并且可以处理 推理过程 的输出。
ChatOpenAI 将路由到 Responses API,如果使用了这些功能。您也可以在实例化 ChatOpenAI 时指定 use_responses_api=True。
内置工具
将 ChatOpenAI 配备内置工具将使其响应基于外部信息,例如文件或网络中的上下文。来自模型生成的 AIMessage 将包含关于内置工具调用的信息。
网搜索
要触发网络搜索,请向模型传递{"type": "web_search_preview"},就像使用另一个工具一样。
您也可以将内置工具作为调用参数传递:
llm.invoke("...", tools=[{"type": "web_search_preview"}])
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
tool = {"type": "web_search_preview"}
llm_with_tools = llm.bind_tools([tool])
response = llm_with_tools.invoke("What was a positive news story from today?")
请注意,响应包含结构化的 内容块,其中既包括响应的文本,也包括引用其来源的 OpenAI 注释:
response.content
[{'type': 'text',
'text': 'Today, a heartwarming story emerged from Minnesota, where a group of high school robotics students built a custom motorized wheelchair for a 2-year-old boy named Cillian Jackson. Born with a genetic condition that limited his mobility, Cillian\'s family couldn\'t afford the $20,000 wheelchair he needed. The students at Farmington High School\'s Rogue Robotics team took it upon themselves to modify a Power Wheels toy car into a functional motorized wheelchair for Cillian, complete with a joystick, safety bumpers, and a harness. One team member remarked, "I think we won here more than we do in our competitions. Instead of completing a task, we\'re helping change someone\'s life." ([boredpanda.com](https://www.boredpanda.com/wholesome-global-positive-news/?utm_source=openai))\n\nThis act of kindness highlights the profound impact that community support and innovation can have on individuals facing challenges. ',
'annotations': [{'end_index': 778,
'start_index': 682,
'title': '“Global Positive News”: 40 Posts To Remind Us There’s Good In The World',
'type': 'url_citation',
'url': 'https://www.boredpanda.com/wholesome-global-positive-news/?utm_source=openai'}]}]
您可以通过使用response.text()来仅恢复响应的文本内容为字符串。例如,按流式方式获取响应文本:
for token in llm_with_tools.stream("..."):
print(token.text(), end="|")
见流式指南以获取更多详细信息。
输出消息还将包含来自任何工具调用的信息:
response.additional_kwargs
{'tool_outputs': [{'id': 'ws_67d192aeb6cc81918e736ad4a57937570d6f8507990d9d71',
'status': 'completed',
'type': 'web_search_call'}]}
文件搜索
要触发文件搜索,请将一个文件搜索工具传递给模型,就像传递其他工具一样。您需要填充一个由OpenAI管理的向量存储,并在工具定义中包含向量存储ID。更多详情请参阅OpenAI文档。
llm = ChatOpenAI(model="gpt-4o-mini")
openai_vector_store_ids = [
"vs_...", # your IDs here
]
tool = {
"type": "file_search",
"vector_store_ids": openai_vector_store_ids,
}
llm_with_tools = llm.bind_tools([tool])
response = llm_with_tools.invoke("What is deep research by OpenAI?")
print(response.text())
Deep Research by OpenAI is a new capability integrated into ChatGPT that allows for the execution of multi-step research tasks independently. It can synthesize extensive amounts of online information and produce comprehensive reports similar to what a research analyst would do, significantly speeding up processes that would typically take hours for a human.
### Key Features:
- **Independent Research**: Users simply provide a prompt, and the model can find, analyze, and synthesize information from hundreds of online sources.
- **Multi-Modal Capabilities**: The model is also able to browse user-uploaded files, plot graphs using Python, and embed visualizations in its outputs.
- **Training**: Deep Research has been trained using reinforcement learning on real-world tasks that require extensive browsing and reasoning.
### Applications:
- Useful for professionals in sectors like finance, science, policy, and engineering, enabling them to obtain accurate and thorough research quickly.
- It can also be beneficial for consumers seeking personalized recommendations on complex purchases.
### Limitations:
Although Deep Research presents significant advancements, it has some limitations, such as the potential to hallucinate facts or struggle with authoritative information.
Deep Research aims to facilitate access to thorough and documented information, marking a significant step toward the broader goal of developing artificial general intelligence (AGI).
就像网络搜索一样,响应将包含带有引用的内容块:
response.content[0]["annotations"][:2]
[{'file_id': 'file-3UzgX7jcC8Dt9ZAFzywg5k',
'index': 346,
'type': 'file_citation',
'filename': 'deep_research_blog.pdf'},
{'file_id': 'file-3UzgX7jcC8Dt9ZAFzywg5k',
'index': 575,
'type': 'file_citation',
'filename': 'deep_research_blog.pdf'}]
这还将包括内置工具调用的信息:
response.additional_kwargs
{'tool_outputs': [{'id': 'fs_67d196fbb83c8191ba20586175331687089228ce932eceb1',
'queries': ['What is deep research by OpenAI?'],
'status': 'completed',
'type': 'file_search_call'}]}
计算机使用
ChatOpenAI 支持 "computer-use-preview" 模型,这是一个专门为内置计算机使用工具设计的模型。要启用,请像传递其他工具一样传递一个 计算机使用工具。
目前,为计算机使用提供的工具输出位于AIMessage.additional_kwargs["tool_outputs"]。为了回应计算机使用工具的调用,请构建一个ToolMessage并将{"type": "computer_call_output"}放入其additional_kwargs中。消息的内容将会是截屏图片。下面我们将展示一个简单的示例。
首先,加载两个截图:
import base64
def load_png_as_base64(file_path):
with open(file_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
return encoded_string.decode("utf-8")
screenshot_1_base64 = load_png_as_base64(
"/path/to/screenshot_1.png"
) # perhaps a screenshot of an application
screenshot_2_base64 = load_png_as_base64(
"/path/to/screenshot_2.png"
) # perhaps a screenshot of the Desktop
from langchain_openai import ChatOpenAI
# Initialize model
llm = ChatOpenAI(
model="computer-use-preview",
model_kwargs={"truncation": "auto"},
)
# Bind computer-use tool
tool = {
"type": "computer_use_preview",
"display_width": 1024,
"display_height": 768,
"environment": "browser",
}
llm_with_tools = llm.bind_tools([tool])
# Construct input message
input_message = {
"role": "user",
"content": [
{
"type": "text",
"text": (
"Click the red X to close and reveal my Desktop. "
"Proceed, no confirmation needed."
),
},
{
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot_1_base64}",
},
],
}
# Invoke model
response = llm_with_tools.invoke(
[input_message],
reasoning={
"generate_summary": "concise",
},
)
The response will include a call to the computer-use tool in its additional_kwargs:
response.additional_kwargs
{'reasoning': {'id': 'rs_67ddb381c85081919c46e3e544a161e8051ff325ba1bad35',
'summary': [{'text': 'Closing Visual Studio Code application',
'type': 'summary_text'}],
'type': 'reasoning'},
'tool_outputs': [{'id': 'cu_67ddb385358c8191bf1a127b71bcf1ea051ff325ba1bad35',
'action': {'button': 'left', 'type': 'click', 'x': 17, 'y': 38},
'call_id': 'call_Ae3Ghz8xdqZQ01mosYhXXMho',
'pending_safety_checks': [],
'status': 'completed',
'type': 'computer_call'}]}
我们接下来构造一个ToolMessage,具有以下属性:
- 它有一个
tool_call_id对应着计算机呼叫中的call_id。 - 它在它的
additional_kwargs中有{"type": "computer_call_output"}。 - 其内容是一个
image_url或一个input_image输出块(请参阅OpenAI文档以获取格式说明)。
from langchain_core.messages import ToolMessage
tool_call_id = response.additional_kwargs["tool_outputs"][0]["call_id"]
tool_message = ToolMessage(
content=[
{
"type": "input_image",
"image_url": f"data:image/png;base64,{screenshot_2_base64}",
}
],
# content=f"data:image/png;base64,{screenshot_2_base64}", # <-- also acceptable
tool_call_id=tool_call_id,
additional_kwargs={"type": "computer_call_output"},
)
现在我们可以使用消息历史再次调用模型:
messages = [
input_message,
response,
tool_message,
]
response_2 = llm_with_tools.invoke(
messages,
reasoning={
"generate_summary": "concise",
},
)
response_2.text()
'Done! The Desktop is now visible.'
代替返回整个序列,我们也可以使用previous_response_id:
previous_response_id = response.response_metadata["id"]
response_2 = llm_with_tools.invoke(
[tool_message],
previous_response_id=previous_response_id,
reasoning={
"generate_summary": "concise",
},
)
response_2.text()
'The Visual Studio Code terminal has been closed and your desktop is now visible.'
管理对话状态
The Responses API 支持管理对话状态。
手动管理状态
您可以手动管理状态或使用LangGraph,就像其他聊天模型一样:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
tool = {"type": "web_search_preview"}
llm_with_tools = llm.bind_tools([tool])
first_query = "What was a positive news story from today?"
messages = [{"role": "user", "content": first_query}]
response = llm_with_tools.invoke(messages)
response_text = response.text()
print(f"{response_text[:100]}... {response_text[-100:]}")
As of March 12, 2025, here are some positive news stories that highlight recent uplifting events:
*... exemplify positive developments in health, environmental sustainability, and community well-being.
second_query = (
"Repeat my question back to me, as well as the last sentence of your answer."
)
messages.extend(
[
response,
{"role": "user", "content": second_query},
]
)
second_response = llm_with_tools.invoke(messages)
print(second_response.text())
Your question was: "What was a positive news story from today?"
The last sentence of my answer was: "These stories exemplify positive developments in health, environmental sustainability, and community well-being."
通过传递previous_response_id
当使用 Responses API 时,LangChain 消息将在其元数据中包含一个 "id" 字段。将此 ID 传递给后续调用将继续对话。请注意,这从计费角度来看与手动传递消息是等价的。
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-mini",
use_responses_api=True,
)
response = llm.invoke("Hi, I'm Bob.")
print(response.text())
Hi Bob! How can I assist you today?
second_response = llm.invoke(
"What is my name?",
previous_response_id=response.response_metadata["id"],
)
print(second_response.text())
Your name is Bob. How can I help you today, Bob?
推理输出
有些LangChain AI模型将会生成独立的文本内容来展示它们的推理过程。请参阅LangChain的推理文档以获取更多细节。
OpenAI 可以返回模型推理的摘要(尽管它不会暴露原始推理标记)。要配置 ChatOpenAI 以返回此摘要,请指定 reasoning 参数:
from langchain_openai import ChatOpenAI
reasoning = {
"effort": "medium", # 'low', 'medium', or 'high'
"summary": "auto", # 'detailed', 'auto', or None
}
llm = ChatOpenAI(
model="o4-mini",
use_responses_api=True,
model_kwargs={"reasoning": reasoning},
)
response = llm.invoke("What is 3^3?")
# Output
response.text()
'3^3 = 3 × 3 × 3 = 27.'
# Reasoning
reasoning = response.additional_kwargs["reasoning"]
for block in reasoning["summary"]:
print(block["text"])
**Calculating power of three**
The user is asking for the result of 3 to the power of 3, which I know is 27. It's a straightforward question, so I’ll keep my answer concise: 27. I could explain that this is the same as multiplying 3 by itself twice: 3 × 3 × 3 equals 27. However, since the user likely just needs the answer, I’ll simply respond with 27.
Fine-tuning
可以传递对应的modelName参数来调用 Fine-tuned OpenAI 模型。
这通常表现为ft:{OPENAI_MODEL_NAME}:{ORG_NAME}::{MODEL_ID}。例如:
fine_tuned_model = ChatOpenAI(
temperature=0, model_name="ft:gpt-3.5-turbo-0613:langchain::7qTVM5AR"
)
fine_tuned_model.invoke(messages)
AIMessage(content="J'adore la programmation.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 31, 'total_tokens': 39}, 'model_name': 'ft:gpt-3.5-turbo-0613:langchain::7qTVM5AR', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-0f39b30e-c56e-4f3b-af99-5c948c984146-0', usage_metadata={'input_tokens': 31, 'output_tokens': 8, 'total_tokens': 39})
多模态输入
OpenAI 有支持多模态输入的模型。您可以将图像或音频传递给这些模型。如需了解在 LangChain 中如何实现这一点,请参阅 多模态输入 文档。
您可以查看支持不同模态的模型列表,在OpenAI的文档中。
在本文档编写时,您主要会使用的OpenAI模型是:
- Image inputs:
gpt-4o,gpt-4o-mini - 音频输入:
gpt-4o-audio-preview
对于传递图像输入的示例,请参阅多模态输入指导手册。
下面是一个将音频输入传递给gpt-4o-audio-preview的示例:
import base64
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-audio-preview",
temperature=0,
)
with open(
"../../../../libs/partners/openai/tests/integration_tests/chat_models/audio_input.wav",
"rb",
) as f:
# b64 encode it
audio = f.read()
audio_b64 = base64.b64encode(audio).decode()
output_message = llm.invoke(
[
(
"human",
[
{"type": "text", "text": "Transcribe the following:"},
# the audio clip says "I'm sorry, but I can't create..."
{
"type": "input_audio",
"input_audio": {"data": audio_b64, "format": "wav"},
},
],
),
]
)
output_message.content
"I'm sorry, but I can't create audio content that involves yelling. Is there anything else I can help you with?"
预测输出
需要 langchain-openai>=0.2.6
一些由OpenAI提供的模型(例如它们的gpt-4o和gpt-4o-mini系列)支持预测输出,这允许您提前输入LLM预期输出的一部分以减少延迟。这对于需要编辑文本或代码的情况非常有用,在这些情况下,模型的输出中只有很小一部分会发生变化。
这是一个示例:
code = """
/// <summary>
/// Represents a user with a first name, last name, and username.
/// </summary>
public class User
{
/// <summary>
/// Gets or sets the user's first name.
/// </summary>
public string FirstName { get; set; }
/// <summary>
/// Gets or sets the user's last name.
/// </summary>
public string LastName { get; set; }
/// <summary>
/// Gets or sets the user's username.
/// </summary>
public string Username { get; set; }
}
"""
llm = ChatOpenAI(model="gpt-4o")
query = (
"Replace the Username property with an Email property. "
"Respond only with code, and with no markdown formatting."
)
response = llm.invoke(
[{"role": "user", "content": query}, {"role": "user", "content": code}],
prediction={"type": "content", "content": code},
)
print(response.content)
print(response.response_metadata)
/// <summary>
/// Represents a user with a first name, last name, and email.
/// </summary>
public class User
{
/// <summary>
/// Gets or sets the user's first name.
/// </summary>
public string FirstName { get; set; }
/// <summary>
/// Gets or sets the user's last name.
/// </summary>
public string LastName { get; set; }
/// <summary>
/// Gets or sets the user's email.
/// </summary>
public string Email { get; set; }
}
{'token_usage': {'completion_tokens': 226, 'prompt_tokens': 166, 'total_tokens': 392, 'completion_tokens_details': {'accepted_prediction_tokens': 49, 'audio_tokens': None, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 107}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_45cf54deae', 'finish_reason': 'stop', 'logprobs': None}
请注意,当前预测按额外的令牌计费,并可能增加您的使用量和成本以换取较低的延迟。
音频生成(预览)
需要 langchain-openai>=0.2.3
OpenAI 有一个新的 音频生成功能,这使得你可以使用音频输入和输出与 gpt-4o-audio-preview 模型进行交互。
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4o-audio-preview",
temperature=0,
model_kwargs={
"modalities": ["text", "audio"],
"audio": {"voice": "alloy", "format": "wav"},
},
)
output_message = llm.invoke(
[
("human", "Are you made by OpenAI? Just answer yes or no"),
]
)
output_message.additional_kwargs['audio'] 将包含一个字典_like
{
'data': '<audio data b64-encoded',
'expires_at': 1729268602,
'id': 'audio_67127d6a44348190af62c1530ef0955a',
'transcript': 'Yes.'
}
and the format will be what was passed in model_kwargs['audio']['format'].
我们将此消息及其音频数据作为消息历史的一部分传递给模型,以便在 `openai 0` 达到之前返回给模型。
输出音频存储在audio键下的AIMessage.additional_kwargs中,但输入内容块使用的是带有input_audio类型和键的HumanMessage.content列表。
对于更多信息,请参阅OpenAI的音频文档。
history = [
("human", "Are you made by OpenAI? Just answer yes or no"),
output_message,
("human", "And what is your name? Just give your name."),
]
second_output_message = llm.invoke(history)
弹性处理
OpenAI 提供了多种 服务层级。"flex" 级别提供了更便宜的请求定价,但作为折衷方案,响应时间可能会更长,并且资源可能不会一直可用。此方法最适合于非关键任务,包括模型测试、数据增强或可以异步运行的任务。
使用它,请用service_tier="flex"初始化模型:
llm = ChatOpenAI(model="o4-mini", service_tier="flex")
请注意,此为仅对部分模型可用的测试版功能。详见 OpenAI 文档 以获取更多详情。
API 参考
详细介绍了所有ChatOpenAI功能和配置的文档,请参阅API参考。