Skip to main content
Open In ColabOpen on GitHub

Vectara

概览

Vectara 是可信赖的 AI 助手和代理平台,专注于为企业关键应用做好准备。 Vectara 无服务器 RAG-as-a-service 提供了所有 RAG 的组件,通过易于使用的 API 包括:

  1. 一种从文件(PDF、PPT、DOCX等)中提取文本的方法
  2. 基于机器学习的分块提供了一流的性能。
  3. The Boomerang 模型。
  4. 自己的内部向量数据库,其中存储了文本片段和嵌入向量。
  5. 一个自动将查询编码为嵌入向量的服务,并检索最相关的文本片段,包括对混合搜索的支持以及多种再排序选项如多语言相关性再排序器MMRUDF再排序器
  6. 基于检索到的文档(上下文),根据所获取的文档生成一个生成性摘要,包括引用。

对于更多信息,请参阅:

这个笔记本展示了如何使用 Vectara 的 聊天 功能,该功能可以自动存储会话历史,并确保后续问题考虑了该历史记录。

设置

要使用VectaraVectorStore,您首先需要安装合作伙伴包。

!uv pip install -U pip && uv pip install -qU langchain-vectara

Getting Started

要开始,请按照以下步骤操作:

  1. 如果您还没有,请注册免费试用Vectara。
  2. 在您的账户中,您可以创建一个或多个语料库。每个语料库代表一个存储从输入文档导入的文本数据的区域。要创建一个语料库,请使用"Create Corpus"按钮。然后您需要为语料库提供一个名称以及描述。可选地,您可以定义过滤属性并应用一些高级选项。如果您点击您创建的语料库,您可以在顶部看到其名称和语料库ID。
  3. 接下来,您需要创建API密钥以访问语料库。在语料库视图中点击"权限控制"标签页,然后点击"创建API密钥"按钮。给您的密钥命名,并选择您希望的查询仅限或查询+索引选项。点击"创建",现在您就有了一个有效的API密钥。请保密保存此密钥。

要使用LangChain与Vectara配合,您需要拥有这两个值:corpus_keyapi_key。 您可以以两种方式向LangChain提供VECTARA_API_KEY

Instantiation

  1. 在您的环境中包含以下两个变量:VECTARA_API_KEY

    例如,您可以使用os.environ和getpass按照如下方式设置这些变量:

import os
import getpass

os.environ["VECTARA_API_KEY"] = getpass.getpass("Vectara API Key:")
  1. 将它们添加到Vectara向量存储构造函数中:
vectara = Vectara(
vectara_api_key=vectara_api_key
)

在本笔记本中,我们假设它们已经提供在环境中。

import os

os.environ["VECTARA_API_KEY"] = "<VECTARA_API_KEY>"
os.environ["VECTARA_CORPUS_KEY"] = "<VECTARA_CORPUS_KEY>"

from langchain_vectara import Vectara
from langchain_vectara.vectorstores import (
CorpusConfig,
GenerationConfig,
MmrReranker,
SearchConfig,
VectaraQueryConfig,
)

Vectara Chat 解释

在大多数使用LangChain创建聊天机器人的情况下,必须集成一个特殊的memory组件来维护会话历史,并利用该历史确保聊天机器人了解对话历史。

使用 Vectara Chat - 所有这些都在后台由 Vectara 自动完成。您可以通过查看 Chat 文档获取更多详细信息,了解其实现内部的工作原理,但在 LangChain 中,您只需要在 Vectara 向量存储中开启该功能即可。

Let's see an example. First we load the SOTU document (remember, text extraction and chunking all occurs automatically on the Vectara platform):

from langchain_community.document_loaders import TextLoader

loader = TextLoader("../document_loaders/example_data/state_of_the_union.txt")
documents = loader.load()

corpus_key = os.getenv("VECTARA_CORPUS_KEY")
vectara = Vectara.from_documents(documents, embedding=None, corpus_key=corpus_key)
API 参考:文本加载器

现在我们使用as_chat方法创建一个Chat Runnable:

generation_config = GenerationConfig(
max_used_search_results=7,
response_language="eng",
generation_preset_name="vectara-summary-ext-24-05-med-omni",
enable_factual_consistency_score=True,
)
search_config = SearchConfig(
corpora=[CorpusConfig(corpus_key=corpus_key, limit=25)],
reranker=MmrReranker(diversity_bias=0.2),
)

config = VectaraQueryConfig(
search=search_config,
generation=generation_config,
)


bot = vectara.as_chat(config)

Invocation

以下是一个没有聊天历史的情况下提问的例子

bot.invoke("What did the president say about Ketanji Brown Jackson?")["answer"]
'The president stated that nominating someone to serve on the United States Supreme Court is one of the most serious constitutional responsibilities. He nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, describing her as one of the nation’s top legal minds who will continue Justice Breyer’s legacy of excellence and noting her experience as a former top litigator in private practice [1].'

这是一个带有聊天历史的提问示例

bot.invoke("Did he mention who she suceeded?")["answer"]
'Yes, the president mentioned that Ketanji Brown Jackson succeeded Justice Breyer [1].'

流式聊天

当然,聊天机器人界面也支持流式传输。 您只需使用stream方法而不是invoke方法:

output = {}
curr_key = None
for chunk in bot.stream("what did he said about the covid?"):
for key in chunk:
if key not in output:
output[key] = chunk[key]
else:
output[key] += chunk[key]
if key == "answer":
print(chunk[key], end="", flush=True)
curr_key = key
The president acknowledged the significant impact of COVID-19 on the nation, expressing understanding of the public's fatigue and frustration. He emphasized the need to view COVID-19 not as a partisan issue but as a serious disease, urging unity among Americans. The president highlighted the progress made, noting that severe cases have decreased significantly, and mentioned new CDC guidelines allowing most Americans to be mask-free. He also pointed out the efforts to vaccinate the nation and provide economic relief, and the ongoing commitment to vaccinate the world [2], [3], [5].

链式调用

额外的功能您可以通过链接使用。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0)

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that explains the stuff to a five year old. Vectara is providing the answer.",
),
("human", "{vectara_response}"),
]
)


def get_vectara_response(question: dict) -> str:
"""
Calls Vectara as_chat and returns the answer string. This encapsulates
the Vectara call.
"""
try:
response = bot.invoke(question["question"])
return response["answer"]
except Exception as e:
return "I'm sorry, I couldn't get an answer from Vectara."


# Create the chain
chain = get_vectara_response | prompt | llm | StrOutputParser()


# Invoke the chain
result = chain.invoke({"question": "what did he say about the covid?"})
print(result)
So, the president talked about how the COVID-19 sickness has affected a lot of people in the country. He said that it's important for everyone to work together to fight the sickness, no matter what political party they are in. The president also mentioned that they are working hard to give vaccines to people to help protect them from getting sick. They are also giving money and help to people who need it, like food, housing, and cheaper health insurance. The president also said that they are sending vaccines to many other countries to help people all around the world stay healthy.

API 参考

您可以查看聊天文档以获取详细信息。