使用多模态内容运行评估
LangSmith 允许您创建带有文件附件的数据集示例——例如图像、音频文件或文档——以便在评估使用多模态输入或输出的应用程序时引用这些内容。
虽然您可以通过 base64 编码将多模态数据包含在示例中,但这种方法效率低下——编码后的数据占用的空间比原始二进制文件更大,导致与 LangSmith 之间的传输速度变慢。改用附件则能提供两个关键优势:" }
- 由于更高效的二进制文件传输,上传和下载速度更快
- LangSmith UI 中不同文件类型的增强可视化
- SDK
- UI
1. 创建带附件的示例
要使用 SDK 上传带有附件的示例,请使用 create_examples / update_examples Python 方法,或 uploadExamplesMultipart / updateExamplesMultipart TypeScript 方法。
- Python
- TypeScript
需要 langsmith>=0.3.13
import requests
import uuid
from pathlib import Path
from langsmith import Client
# Publicly available test files
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
wav_url = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav"
img_url = "https://www.w3.org/Graphics/PNG/nurbcup2si.png"
# Fetch the files as bytes
pdf_bytes = requests.get(pdf_url).content
wav_bytes = requests.get(wav_url).content
img_bytes = requests.get(img_url).content
# Create the dataset
ls_client = Client()
dataset_name = "attachment-test-dataset"
dataset = ls_client.create_dataset(
dataset_name=dataset_name,
description="Test dataset for evals with publicly available attachments",
)
inputs = {
"audio_question": "What is in this audio clip?",
"image_question": "What is in this image?",
}
outputs = {
"audio_answer": "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.",
"image_answer": "A mug with a blanket over it.",
}
# Define an example with attachments
example_id = uuid.uuid4()
example = {
"id": example_id,
"inputs": inputs,
"outputs": outputs,
"attachments": {
"my_pdf": {"mime_type": "application/pdf", "data": pdf_bytes},
"my_wav": {"mime_type": "audio/wav", "data": wav_bytes},
"my_img": {"mime_type": "image/png", "data": img_bytes},
# Example of an attachment specified via a local file path:
# "my_local_img": {"mime_type": "image/png", "data": Path(__file__).parent / "my_local_img.png"},
},
}
# Create the example
ls_client.create_examples(
dataset_id=dataset.id,
examples=[example],
# Uncomment this flag if you'd like to upload attachments from local files:
# dangerously_allow_filesystem=True
)
需要版本 >= 0.2.13
您可以使用 uploadExamplesMultipart 方法上传带有附件的示例。
请注意,这是一种不同于标准 createExamples 方法的方法,该方法当前不支持附件。 每个附件都需要将 Uint8Array 或 ArrayBuffer 作为数据类型。
Uint8Array:适用于直接处理二进制数据。ArrayBuffer:表示固定长度的二进制数据,可根据需要转换为Uint8Array。
请注意,您不能在TypeScript SDK中直接传入文件路径,因为并非所有运行时环境都支持访问本地文件。
import { Client } from "langsmith";
import { v4 as uuid4 } from "uuid";
// Publicly available test files
const pdfUrl = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf";
const wavUrl = "https://openaiassets.blob.core.windows.net/$web/API/docs/audio/alloy.wav";
const pngUrl = "https://www.w3.org/Graphics/PNG/nurbcup2si.png";
// Helper function to fetch file as ArrayBuffer
async function fetchArrayBuffer(url: string): Promise<ArrayBuffer> {
const response = await fetch(url);
if (!response.ok) {
throw new Error(`Failed to fetch ${url}: ${response.statusText}`);
}
return response.arrayBuffer();
}
// Fetch files as ArrayBuffer
const pdfArrayBuffer = await fetchArrayBuffer(pdfUrl);
const wavArrayBuffer = await fetchArrayBuffer(wavUrl);
const pngArrayBuffer = await fetchArrayBuffer(pngUrl);
// Create the LangSmith client (Ensure LANGSMITH_API_KEY is set in env)
const langsmithClient = new Client();
// Create a unique dataset name
const datasetName = "attachment-test-dataset:" + uuid4().substring(0, 8);
// Create the dataset
const dataset = await langsmithClient.createDataset(datasetName, {
description: "Test dataset for evals with publicly available attachments",
});
// Define the example with attachments
const exampleId = uuid4();
const example = {
id: exampleId,
inputs: {
audio_question: "What is in this audio clip?",
image_question: "What is in this image?",
},
outputs: {
audio_answer: "The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.",
image_answer: "A mug with a blanket over it.",
},
attachments: {
my_pdf: {
mimeType: "application/pdf",
data: pdfArrayBuffer
},
my_wav: {
mimeType: "audio/wav",
data: wavArrayBuffer
},
my_img: {
mimeType: "image/png",
data: pngArrayBuffer
},
},
};
// Upload the example with attachments to the dataset
await langsmithClient.uploadExamplesMultipart(dataset.id, [example]);
除了以字节形式传入外,附件也可以指定为本地文件的路径。 为此,请将附件的 data 值设为路径,并指定参数 dangerously_allow_filesystem=True:
client.create_examples(..., dangerously_allow_filesystem=True)
2. 运行评估
定义目标函数
现在我们已经拥有一个包含附件示例的数据集,我们可以定义一个目标函数来遍历这些示例。 以下示例仅使用 OpenAI 的 GPT-4o 模型来回答关于图像和音频片段的问题。
- Python
- TypeScript
您正在评估的目标函数必须具有两个位置参数,以便消费与示例关联的附件,第一个参数必须命名为inputs,第二个参数必须命名为attachments。
- The
inputs参数是一个字典,包含示例的输入数据(不包括附件)。 attachments参数是一个字典,它将附件名称映射到包含预签名 URL、MIME 类型以及文件字节内容的读取器的字典。您可以使用预签名 URL 或读取器来获取文件内容。 attachments 字典中的每个值都是一个具有以下结构的字典:
{
"presigned_url": str,
"mime_type": str,
"reader": BinaryIO
}
from langsmith.wrappers import wrap_openai
import base64
from openai import OpenAI
client = wrap_openai(OpenAI())
# Define target function that uses attachments
def file_qa(inputs, attachments): # Read the audio bytes from the reader and encode them in base64
audio_reader = attachments["my_wav"]["reader"]
audio_b64 = base64.b64encode(audio_reader.read()).decode('utf-8')
audio_completion = client.chat.completions.create(
model="gpt-4o-audio-preview",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": inputs["audio_question"]
},
{
"type": "input_audio",
"input_audio": {
"data": audio_b64,
"format": "wav"
}
}
]
}
]
)
# Most models support taking in an image URL directly in addition to base64 encoded images
# You can pipe the image pre-signed URL directly to the model
image_url = attachments["my_img"]["presigned_url"]
image_completion = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": inputs["image_question"]},
{
"type": "image_url",
"image_url": {
"url": image_url,
},
},
],
}
],
)
return {
"audio_answer": audio_completion.choices[0].message.content,
"image_answer": image_completion.choices[0].message.content,
}
在 TypeScript SDK 中,如果将 includeAttachments 设置为 true,则使用 config 参数向目标函数传递附件。
The config 将包含 attachments,它是一个对象,用于将附件名称映射到如下形式的对象:
{
presigned_url: string,
mime_type: string,
}
import OpenAI from "openai";
import { wrapOpenAI } from "langsmith/wrappers";
const client: any = wrapOpenAI(new OpenAI());
async function fileQA(inputs: Record<string, any>, config?: Record<string, any>) {
const presignedUrl = config?.attachments?.["my_wav"]?.presigned_url;
if (!presignedUrl) {
throw new Error("No presigned URL provided for audio.");
}
const response = await fetch(presignedUrl);
if (!response.ok) {
throw new Error(`Failed to fetch audio: ${response.statusText}`);
}
const arrayBuffer = await response.arrayBuffer();
const uint8Array = new Uint8Array(arrayBuffer);
const audioB64 = Buffer.from(uint8Array).toString("base64");
const audioCompletion = await client.chat.completions.create({
model: "gpt-4o-audio-preview",
messages: [
{
role: "user",
content: [
{ type: "text", text: inputs["audio_question"] },
{
type: "input_audio",
input_audio: {
data: audioB64,
format: "wav",
},
},
],
},
],
});
const imageUrl = config?.attachments?.["my_img"]?.presigned_url
const imageCompletion = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{
role: "user",
content: [
{ type: "text", text: inputs["image_question"] },
{
type: "image_url",
image_url: {
url: imageUrl,
},
},
],
},
],
});
return {
audio_answer: audioCompletion.choices[0].message.content,
image_answer: imageCompletion.choices[0].message.content,
};
}
定义自定义评估器
与上述规则完全相同,用于确定评估器是否应接收附件。
下面的评估器使用大型语言模型来判断推理与答案是否一致。 要了解如何定义基于大型语言模型的评估器,请参阅 此指南。
- Python
- TypeScript
# Assumes you've installed pydantic
from pydantic import BaseModel
def valid_image_description(outputs: dict, attachments: dict) -> bool:
"""Use an LLM to judge if the image description and images are consistent."""
instructions = """
Does the description of the following image make sense?
Please carefully review the image and the description to determine if the description is valid."""
class Response(BaseModel):
description_is_valid: bool
image_url = attachments["my_img"]["presigned_url"]
response = client.beta.chat.completions.parse(
model="gpt-4o",
messages=[
{
"role": "system",
"content": instructions
},
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": image_url}},
{"type": "text", "text": outputs["image_answer"]}
]
}
],
response_format=Response
)
return response.choices[0].message.parsed.description_is_valid
ls_client.evaluate(
file_qa,
data=dataset_name,
evaluators=[valid_image_description],
)
import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';
import { evaluate } from "langsmith/evaluation";
const DescriptionResponse = z.object({
description_is_valid: z.boolean(),
});
async function validImageDescription({
outputs,
attachments,
}: {
outputs?: any;
attachments?: any;
}): Promise<{ key: string; score: boolean}> {
const instructions = `Does the description of the following image make sense?
Please carefully review the image and the description to determine if the description is valid.`;
const imageUrl = attachments?.["my_img"]?.presigned_url
const completion = await client.beta.chat.completions.parse({
model: "gpt-4o",
messages: [
{
role: "system",
content: instructions,
},
{
role: "user",
content: [
{ type: "image_url", image_url: { url: imageUrl } },
{ type: "text", text: outputs?.image_answer },
],
},
],
response_format: zodResponseFormat(DescriptionResponse, 'imageResponse'),
});
const score: boolean = completion.choices[0]?.message?.parsed?.description_is_valid ?? false;
return { key: "valid_image_description", score };
}
const resp = await evaluate(fileQA, {
data: datasetName,
// Need to pass flag to include attachments
includeAttachments: true,
evaluators: [validImageDescription],
client: langsmithClient
});
更新带附件的示例
在上面的代码中,我们展示了如何将带有附件的示例添加到数据集中。 使用 SDK 也可以更新这些相同的示例。
与现有示例一样,当您使用附件更新数据集时,数据集会被版本化。因此,您可以导航到数据集版本历史记录以查看对每个示例所做的更改。 要了解更多信息,请参阅此指南。
在更新带有附件的示例时,您可以通过几种不同的方式更新附件:
- 传入新附件
- 重命名现有附件
- 删除现有附件
请注意:
- 任何未明确重命名或保留的现有附件将被删除。
- 如果您向
retain或rename传递一个不存在的附件名称,将引发错误。 - 如果相同的附件名称出现在
attachments和attachment_operations字段中,新的附件将优先于现有附件。
- Python
- TypeScript
example_update = {
"id": example_id,
"attachments": {
# These are net new attachments
"my_new_file": ("text/plain", b"foo bar"),
},
"inputs": inputs,
"outputs": outputs,
# Any attachments not in rename/retain will be deleted.
# In this case, that would be "my_img" if we uploaded it.
"attachments_operations": {
# Retained attachments will stay exactly the same
"retain": ["my_pdf"],
# Renaming attachments preserves the original data
"rename": {
"my_wav": "my_new_wav",
}
},
}
ls_client.update_examples(dataset_id=dataset.id, updates=[example_update])
import { ExampleUpdateWithAttachments } from "langsmith/schemas";
const exampleUpdate: ExampleUpdateWithAttachments = {
id: exampleId,
attachments: {
// These are net new attachments
"my_new_file": {
mimeType: "text/plain",
data: Buffer.from("foo bar")
},
},
attachments_operations: {
// Retained attachments will stay exactly the same
retain: ["my_img"],
// Renaming attachments preserves the original data
rename: {
"my_wav": "my_new_wav",
},
// Any attachments not in rename/retain will be deleted
// In this case, that would be "my_pdf"
},
};
await langsmithClient.updateExamplesMultipart(
dataset.id,
[exampleUpdate],
);
1. 创建带附件的示例
您可以通过几种不同的方式将带有附件的示例添加到数据集中。
来自现有运行
在向 LangSmith 数据集添加运行时,附件可以选择性地从源运行传播到目标示例。 要了解更多信息,请参阅此指南。

从零开始
您可以直接从 LangSmith UI 创建带有附件的示例。点击数据集 UI 中 Examples 标签页下的 + Example 按钮。 然后使用“上传文件”按钮上传附件:

上传后,您可以在 LangSmith UI 中查看带有附件的示例。每个附件都将渲染预览以便轻松检查。 
2. 创建一个多模态提示
LangSmith UI 允许您在评估多模态模型时,将附件包含在您的提示中:
首先,单击要添加多模态内容的消息中的文件图标。 接下来,为每个示例中包含的附件添加模板变量。
- 对于单个附件类型:使用建议的变量名。注意:所有示例必须包含具有此名称的附件。
- 对于多个附件,或者如果每个示例中的附件名称各不相同:使用
All attachments变量来包含每个示例的所有可用附件。

定义自定义评估器
LangSmith playground 目前不支持将多模态内容拉取到评估器中。如果这对您的用例有帮助,请在 LangChain 的社区 Slack(如果您还不是成员,请在此处 注册)中告知我们!
您可以通过添加一个评估器来评估模型生成的文本输出,该评估器会接收示例的输入和输出。 即使您的评估器不支持多模态功能,您仍然可以运行仅基于文本的评估。例如:
- OCR → 文本校正:使用视觉模型从文档中提取文本,然后评估提取输出的准确性。
- 语音转文本 → 转录质量:使用语音模型将音频转录为文本,然后将其与参考内容进行对比评估。
关于定义自定义评估器的更多信息,请参阅LLM 作为裁判指南。
更新带附件的示例
附件在 UI 中的大小限制为 20MB。
在 UI 中编辑示例时,您可以:
- 上传新附件
- 重命名和删除附件
- 使用快速重置按钮将附件恢复到之前的状态
更改在点击提交前不会保存。
