Pinecone 重排序
这个笔记本展示了如何使用PineconeRerank进行二级向量检索排序,如
langchain_pinecone/libs/pinecone/rerank.py中演示的那样,使用Pinecone托管的排序API。
设置
安装 langchain-pinecone 包。
%pip install -qU "langchain-pinecone"
Credentials
设置您的Pinecone API密钥以使用重新排名API。
import os
from getpass import getpass
os.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY") or getpass(
"Enter your Pinecone API key: "
)
Instantiation
使用PineconeRerank按与查询的相关性重新排序文档列表。
from langchain_core.documents import Document
from langchain_pinecone import PineconeRerank
# Initialize reranker
reranker = PineconeRerank(model="bge-reranker-v2-m3")
# Sample documents
documents = [
Document(page_content="Paris is the capital of France."),
Document(page_content="Berlin is the capital of Germany."),
Document(page_content="The Eiffel Tower is in Paris."),
]
# Rerank documents
query = "What is the capital of France?"
reranked_docs = reranker.compress_documents(documents, query)
# Print results
for doc in reranked_docs:
score = doc.metadata.get("relevance_score")
print(f"Score: {score:.4f} | Content: {doc.page_content}")
API 参考:文档 |Pinecone 重排序
/Users/jakit/customers/aurelio/langchain-pinecone/libs/pinecone/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
``````output
Score: 0.9998 | Content: Paris is the capital of France.
Score: 0.1950 | Content: The Eiffel Tower is in Paris.
Score: 0.0042 | Content: Berlin is the capital of Germany.
用法
重排序(Top-N)
指定top_n以限制返回的文档数量。
# Return only top-1 result
reranker_top1 = PineconeRerank(model="bge-reranker-v2-m3", top_n=1)
top1_docs = reranker_top1.compress_documents(documents, query)
print("Top-1 Result:")
for doc in top1_docs:
print(f"Score: {doc.metadata['relevance_score']:.4f} | Content: {doc.page_content}")
Top-1 Result:
Score: 0.9998 | Content: Paris is the capital of France.
自定义排序字段重新排名
如果您的文档是字典或具有自定义字段,请使用rank_fields指定要排序的字段。
# Sample dictionary documents with 'text' field
docs_dict = [
{
"id": "doc1",
"text": "Article about renewable energy.",
"title": "Renewable Energy",
},
{"id": "doc2", "text": "Report on economic growth.", "title": "Economic Growth"},
{
"id": "doc3",
"text": "News on climate policy changes.",
"title": "Climate Policy",
},
]
# Initialize reranker with rank_fields
reranker_text = PineconeRerank(model="bge-reranker-v2-m3", rank_fields=["text"])
climate_docs = reranker_text.rerank(docs_dict, "Latest news on climate change.")
# Show IDs and scores
for res in climate_docs:
print(f"ID: {res['id']} | Score: {res['score']:.4f}")
ID: doc3 | Score: 0.9892
ID: doc1 | Score: 0.0006
ID: doc2 | Score: 0.0000
我们可以根据标题字段重新排序
economic_docs = reranker_text.rerank(docs_dict, "Economic forecast.")
# Show IDs and scores
for res in economic_docs:
print(
f"ID: {res['id']} | Score: {res['score']:.4f} | Title: {res['document']['title']}"
)
ID: doc2 | Score: 0.8918 | Title: Economic Growth
ID: doc3 | Score: 0.0002 | Title: Climate Policy
ID: doc1 | Score: 0.0000 | Title: Renewable Energy
重新排名与附加参数
您可以直接将模型特定参数(例如,truncate)传递给.rerank()。
如何处理输入长度超过模型支持的长度。接受的值:END 或 NONE。 - END 在输入序列达到输入标记限制时截断输入。 - NONE 当输入超过输入标记限制时返回错误。
# Rerank with custom truncate parameter
docs_simple = [
{"id": "docA", "text": "Quantum entanglement is a physical phenomenon..."},
{"id": "docB", "text": "Classical mechanics describes motion..."},
]
reranked = reranker.rerank(
documents=docs_simple,
query="Explain the concept of quantum entanglement.",
truncate="END",
)
# Print reranked IDs and scores
for res in reranked:
print(f"ID: {res['id']} | Score: {res['score']:.4f}")
ID: docA | Score: 0.6950
ID: docB | Score: 0.0001
使用在链中
API 参考
PineconeRerank(model, top_n, rank_fields, return_documents).rerank(documents, query, rank_fields=None, model=None, top_n=None, truncate="END").compress_documents(documents, query)(返回Document个对象,其中包含relevance_score)