Skip to main content
Open In ColabOpen on GitHub

上下文 AI 重排序器

Contextual AI的指令跟随排序器是世界上第一个旨在遵循关于根据特定标准(如时效性、来源和元数据)优先处理文档的定制指令的排序器。在BEIR基准测试中表现出色(得分为61.2,并显著超越竞争对手),它为企业的RAG应用提供了前所未有的控制力和准确性。

Key Capabilities

  • 指令遵循: 通过自然语言命令动态控制文档排名
  • 冲突解决: 从多个知识来源智能处理矛盾信息
  • 卓越的准确性: 在行业基准测试中实现最先进的性能
  • 无缝集成:您的RAG管道中现有rerankers的即插即用替代方案

reranker 在企业知识库中解决实际问题方面表现出色,例如优先考虑较新的文档而非过时的文档,或者更倾向于内部文档而非外部来源。

要了解我们的指令跟随排序算法的更多信息并查看其运行示例,请访问我们的产品概述

有关Contextual AI产品的完整文档,请访问我们的开发者门户

此集成需要使用contextual-client Python SDK。了解有关它的更多信息,请访问此处

概览

此集成调用了Contextual AI的接地语言模型。

集成细节

Class本地序列化JS支持Package downloadsPackage 最新版本
ContextualReranklangchain-contextualbetaPyPI - DownloadsPyPI - Version

设置

要访问Contextual的重排序模型,您需要创建一个/一个Contextual AI账户、获取API密钥,并安装langchain-contextual集成包。

Credentials

前往 app.contextual.ai 注册 Contextual 并生成 API 密钥。完成这些步骤后,请设置 CONTEXTUAL_AI_API_KEY 环境变量:

import getpass
import os

if not os.getenv("CONTEXTUAL_AI_API_KEY"):
os.environ["CONTEXTUAL_AI_API_KEY"] = getpass.getpass(
"Enter your Contextual API key: "
)

安装

The LangChain 上下文集成存在于 langchain-contextual 包中:

%pip install -qU langchain-contextual

Instantiation

Contextual Reranker 参数是:

参数类型描述
documentslist[Document]A sequence of documents to rerank. Any metadata contained in the documents will also be used for reranking.
querystrThe query to use for reranking.
modelstrThe version of the reranker to use. Currently, we just have "ctxl-rerank-en-v1-instruct".
top_nOptional[int]The number of results to return. If None returns all results. Defaults to self.top_n.
instructionOptional[str]The instruction to be used for the reranker.
callbacksOptional[Callbacks]Callbacks to run during the compression process.
from langchain_contextual import ContextualRerank

api_key = ""
model = "ctxl-rerank-en-v1-instruct"

compressor = ContextualRerank(
model=model,
api_key=api_key,
)

用法

首先,我们将设置全局变量和我们将会使用的示例,并实例化我们的重排序客户端。

from langchain_core.documents import Document

query = "What is the current enterprise pricing for the RTX 5090 GPU for bulk orders?"
instruction = "Prioritize internal sales documents over market analysis reports. More recent documents should be weighted higher. Enterprise portal content supersedes distributor communications."

document_contents = [
"Following detailed cost analysis and market research, we have implemented the following changes: AI training clusters will see a 15% uplift in raw compute performance, enterprise support packages are being restructured, and bulk procurement programs (100+ units) for the RTX 5090 Enterprise series will operate on a $2,899 baseline.",
"Enterprise pricing for the RTX 5090 GPU bulk orders (100+ units) is currently set at $3,100-$3,300 per unit. This pricing for RTX 5090 enterprise bulk orders has been confirmed across all major distribution channels.",
"RTX 5090 Enterprise GPU requires 450W TDP and 20% cooling overhead.",
]

metadata = [
{
"Date": "January 15, 2025",
"Source": "NVIDIA Enterprise Sales Portal",
"Classification": "Internal Use Only",
},
{"Date": "11/30/2023", "Source": "TechAnalytics Research Group"},
{
"Date": "January 25, 2025",
"Source": "NVIDIA Enterprise Sales Portal",
"Classification": "Internal Use Only",
},
]

documents = [
Document(page_content=content, metadata=metadata[i])
for i, content in enumerate(document_contents)
]
reranked_documents = compressor.compress_documents(
query=query,
instruction=instruction,
documents=documents,
)
API 参考:文档

使用在链中

示例即将推出。

API 参考

详细介绍了所有ChatContextual功能和配置的文档,请访问Github页面: https://github.com/ContextualAI//langchain-contextual