如何在本地运行评估(测试版,仅限 Python)
试用版
此功能仍处于测试阶段。
有时,在本地运行评估而不将任何结果上传到 LangSmith 会很有帮助。 例如,如果您正在快速迭代提示并希望在几个示例中对其进行冒烟测试,或者如果您正在验证 target 和 evaluator 函数是否定义正确,则可能不希望记录这些计算。
您可以通过使用 LangSmith Python SDK 并将upload_results=False自evaluate() / aevaluate().
这将完全像往常一样运行应用程序和计算器,并返回相同的输出,但不会将任何内容记录到 LangSmith 中。 这不仅包括实验结果,还包括应用程序和评估程序跟踪。
例
让我们看一个例子:
- 蟒
需要langsmith>=0.2.0.Example 还使用pandas.
from langsmith import Client
# 1. Create and/or select your dataset
ls_client = Client()
dataset = ls_client.clone_public_dataset(
"https://smith.langchain.com/public/a63525f9-bdf2-4512-83e3-077dc9417f96/d"
)
# 2. Define an evaluator
def is_concise(outputs: dict, reference_outputs: dict) -> bool:
return len(outputs["answer"]) < (3 * len(reference_outputs["answer"]))
# 3. Define the interface to your app
def chatbot(inputs: dict) -> dict:
return {"answer": inputs["question"] + " is a good question. I don't know the answer."}
# 4. Run an evaluation
experiment = ls_client.evaluate(
chatbot,
data=dataset,
evaluators=[is_concise],
experiment_prefix="my-first-experiment",
# 'upload_results' is the relevant arg.
upload_results=False
)
# 5. Analyze results locally
results = list(experiment)
# Check if 'is_concise' returned False.
failed = [r for r in results if not r["evaluation_results"]["results"][0].score]
# Explore the failed inputs and outputs.
for r in failed:
print(r["example"].inputs)
print(r["run"].outputs)
# Explore the results as a Pandas DataFrame.
# Must have 'pandas' installed.
df = experiment.to_pandas()
df[["inputs.question", "outputs.answer", "reference.answer", "feedback.is_concise"]]
- 蟒
{'question': 'What is the largest mammal?'}
{'answer': "What is the largest mammal? is a good question. I don't know the answer."}
{'question': 'What do mammals and birds have in common?'}
{'answer': "What do mammals and birds have in common? is a good question. I don't know the answer."}
| inputs.question | 输出.answer | 参考.answer | feedback.is_concise | |
|---|---|---|---|---|
| 0 | What is the largest mammal? | What is the largest mammal? is a good question. I don't know the answer. | The blue whale | False |
| 1 | What do mammals and birds have in common? | What do mammals and birds have in common? is a good question. I don't know the answer. | They are both warm-blooded | False |