Skip to main content
Open In ColabOpen on GitHub

盒检索器

这将帮助你开始使用Box检索器。要查看所有BoxRetriever功能和配置的详细文档,请参阅API参考

概览

The BoxRetriever 类别帮助你将 Box 中的非结构化内容转换为 Langchain 的 Document 格式。你可以通过全文搜索文件来实现这一点,或者使用 Box AI 从文件中检索一个包含 AI 查询结果的 Document。这需要包括一个 List[str],其中包含 Box 文件 ID,例如 ["12345","67890"]

信息

Box AI 开发框架需要 Enterprise Plus 许可证

没有文本表示的文件将被跳过。

集成细节

1: 使用您自己的数据(即,索引和搜索自定义的文档集合):

检索器Self-host云开发解决方案
BoxRetrieverlangchain-box

设置

要使用 Box 包,您需要几件事:

  • 如果您不是当前的Box客户或希望在生产Box实例之外进行测试,可以使用一个免费开发者账号
  • 一个 Box 应用 — 这需要在 开发者控制台 中进行配置,对于 Box AI,必须启用 Manage AI 权限范围。在此处,您还需要选择您的认证方式
  • The app must be 由管理员启用. 对于免费开发者账号,这是指最初注册该账号的人。

Credentials

对于这些示例,我们将使用[token认证](0)。这可以与任何[认证方法](1)一起使用。只需使用任何方法获取令牌即可。如果您想了解更多关于如何在2中使用其他认证类型的信息,请访问[Box提供者](3)文档。

import getpass
import os

box_developer_token = getpass.getpass("Enter your Box Developer Token: ")

如果您想要从单个查询中获取自动跟踪,您也可以通过取消注释下方代码来设置您的LangSmith API密钥:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

安装

这个检索器位于langchain-box包中:

%pip install -qU langchain-box
Note: you may need to restart the kernel to use updated packages.

Instantiation

现在我们可以实例化我们的检索器:

from langchain_box import BoxRetriever

retriever = BoxRetriever(box_developer_token=box_developer_token)

对于更精细的搜索,我们提供了一系列选项帮助您筛选结果。这使用 langchain_box.utilities.SearchOptionslangchain_box.utilities.SearchTypeFilterlangchain_box.utilities.DocumentFiles 枚举结合来过滤诸如创建日期、要搜索文件的哪一部分以及甚至将搜索范围限定在特定文件夹等功能。

要获取更多信息,请参阅API参考

from langchain_box.utilities import BoxSearchOptions, DocumentFiles, SearchTypeFilter

box_folder_id = "260931903795"

box_search_options = BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00", "2024-08-01T00:00:00-07:00,"],
k=200,
size_range=[1, 1000000],
updated_data_range=None,
)

retriever = BoxRetriever(
box_developer_token=box_developer_token, box_search_options=box_search_options
)

retriever.invoke("AstroTech Solutions")
[Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/internal_files/1514555423624/versions/1663171610024/representations/extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n    - Gravitational Wave Detector Kit: $800\n    - Exoplanet Terrarium: $120\nTotal: $920')]

Box AI

from langchain_box import BoxRetriever

box_file_ids = ["1514555423624", "1514553902288"]

retriever = BoxRetriever(
box_developer_token=box_developer_token, box_file_ids=box_file_ids
)

用法

query = "What was the most expensive item purchased"

retriever.invoke(query)
[Document(metadata={'source': 'Box AI', 'title': 'Box AI What was the most expensive item purchased'}, page_content='The most expensive item purchased is the **Gravitational Wave Detector Kit** from AstroTech Solutions, which costs **$800**.')]

引用

通过 Box AI 和 BoxRetriever,您可以返回对提示问题的回答、返回 Box 用于获取该答案的引用内容,或同时返回两者。无论您选择如何使用 Box AI,检索器都会返回一个 List[Document] 对象。我们通过两个 bool 参数 answercitations 提供这种灵活性。回答默认为 True,引用默认为 False,因此如果只需要答案,可以省略这两个参数。如果您需要同时获取两者,只需包含 citations=True;而如果仅需要引用,则应包含 answer=Falsecitations=True

获取两者

retriever = BoxRetriever(
box_developer_token=box_developer_token, box_file_ids=box_file_ids, citations=True
)

retriever.invoke(query)
[Document(metadata={'source': 'Box AI', 'title': 'Box AI What was the most expensive item purchased'}, page_content='The most expensive item purchased is the **Gravitational Wave Detector Kit** from AstroTech Solutions, which costs **$800**.'),
Document(metadata={'source': 'Box AI What was the most expensive item purchased', 'file_name': 'Invoice-A5555.txt', 'file_id': '1514555423624', 'file_type': 'file'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n - Gravitational Wave Detector Kit: $800\n - Exoplanet Terrarium: $120\nTotal: $920')]

仅引用

retriever = BoxRetriever(
box_developer_token=box_developer_token,
box_file_ids=box_file_ids,
answer=False,
citations=True,
)

retriever.invoke(query)
[Document(metadata={'source': 'Box AI What was the most expensive item purchased', 'file_name': 'Invoice-A5555.txt', 'file_id': '1514555423624', 'file_type': 'file'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n    - Gravitational Wave Detector Kit: $800\n    - Exoplanet Terrarium: $120\nTotal: $920')]

使用在链中

如其他检索器一样,BoxRetriever 可以通过 链路 融入到基于LLM的应用程序中。

我们需要一个大语言模型或聊天模型:

pip install -qU "langchain[openai]"
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")
openai_key = getpass.getpass("Enter your OpenAI key: ")
Enter your OpenAI key:  ········
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

box_search_options = BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00", "2024-08-01T00:00:00-07:00,"],
k=200,
size_range=[1, 1000000],
updated_data_range=None,
)

retriever = BoxRetriever(
box_developer_token=box_developer_token, box_search_options=box_search_options
)

context = "You are a finance professional that handles invoices and purchase orders."
question = "Show me all the items purchased from AstroTech Solutions"

prompt = ChatPromptTemplate.from_template(
"""Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)


def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)


chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke(question)
'- Gravitational Wave Detector Kit: $800\n- Exoplanet Terrarium: $120'

使用作为代理工具

像其他检索器一样,BoxRetriever 也可以被添加到一个 LangGraph 剂剂中作为工具。

pip install -U langsmith
from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools.retriever import create_retriever_tool
box_search_options = BoxSearchOptions(
ancestor_folder_ids=[box_folder_id],
search_type_filter=[SearchTypeFilter.FILE_CONTENT],
created_date_range=["2023-01-01T00:00:00-07:00", "2024-08-01T00:00:00-07:00,"],
k=200,
size_range=[1, 1000000],
updated_data_range=None,
)

retriever = BoxRetriever(
box_developer_token=box_developer_token, box_search_options=box_search_options
)

box_search_tool = create_retriever_tool(
retriever,
"box_search_tool",
"This tool is used to search Box and retrieve documents that match the search criteria",
)
tools = [box_search_tool]
prompt = hub.pull("hwchase17/openai-tools-agent")
prompt.messages

llm = ChatOpenAI(temperature=0, openai_api_key=openai_key)

agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)
/Users/shurrey/local/langchain/.venv/lib/python3.11/site-packages/langsmith/client.py:312: LangSmithMissingAPIKeyWarning: API key must be provided when using hosted LangSmith API
warnings.warn(
result = agent_executor.invoke(
{
"input": "list the items I purchased from AstroTech Solutions from most expensive to least expensive"
}
)
print(f"result {result['output']}")
result The items you purchased from AstroTech Solutions from most expensive to least expensive are:

1. Gravitational Wave Detector Kit: $800
2. Exoplanet Terrarium: $120

Total: $920

额外字段

所有 Box 连接器都提供了从 Box FileFull 对象选择额外字段以返回为自定义 LangChain 元数据的能力。每个对象接受一个可选的参数 List[str],名为 extra_fields,包含返回对象中的 json 键,例如 extra_fields=["shared_link"]

The connector will add this field to the list of fields the integration needs to function and then add the results to the metadata returned in the Document or Blob, like "metadata" : { "source" : "source, "shared_link" : "shared_link" }. If the field is unavailable for that file, it will be returned as an empty string, like "shared_link" : "".

API 参考

详细介绍了所有BoxRetriever功能和配置的文档,请参阅API参考

帮助

如果您有任何问题,可以查看我们的开发者文档或在我们的开发者社区中与我们联系。