如何在一个评估器中返回多个分数

有时，自定义计算器函数或摘要计算器函数返回多个量度很有用。例如，如果 LLM 评委生成了多个指标，则可以通过进行生成多个指标的单个 LLM 调用而不是进行多个 LLM 调用来节省时间和金钱。

要使用 Python SDK 返回多个分数，只需返回以下形式的字典/对象列表：

[
    # 'key' is the metric name
    # 'score' is the value of a numerical metric
    {"key": string, "score": number},
    # 'value' is the value of a categorical metric
    {"key": string, "value": string},
    ... # You may log as many as you wish
]

要使用 JS/TS SDK 执行此作，请返回一个带有 'results' 键的对象，然后返回上述形式的列表

{results: [{ key: string, score: number }, ...]};

这些词典中的每一个都可以包含任何或所有反馈字段;查看链接的文档以了解更多信息。

例：

蟒
TypeScript （类型脚本）

需要langsmith>=0.2.0

def multiple_scores(outputs: dict, reference_outputs: dict) -> list[dict]:
    # Replace with real evaluation logic.
    precision = 0.8
    recall = 0.9
    f1 = 0.85

    return [
        {"key": "precision", "score": precision},
        {"key": "recall", "score": recall},
        {"key": "f1", "score": f1},
    ]

Support for multiple scores is available in langsmith@0.1.32 and higher

import type { Run, Example } from "langsmith/schemas";

function multipleScores(rootRun: Run, example: Example) {
  // Your evaluation logic here
  return {
      results: [
          { key: "precision", score: 0.8 },
          { key: "recall", score: 0.9 },
          { key: "f1", score: 0.85 },
      ],
  };
}

结果实验中的行将显示每个分数。

返回分类指标与数值指标

如何在一个评估器中返回多个分数

这个页面有帮助吗？

您可以在 GitHub 上留下详细的反馈。

相关

这个页面有帮助吗？

您可以在 GitHub 上留下详细的反馈。