百度云 ElasticSearch 向量搜索

Baidu Cloud VectorSearch is a fully managed, enterprise-level distributed search and analysis service which is 100% compatible to open source. Baidu Cloud VectorSearch provides low-cost, high-performance, and reliable retrieval and analysis platform level product services for structured/unstructured data. As a vector database , it supports multiple index types and similarity distance methods.

Baidu Cloud ElasticSearch provides a privilege management mechanism, for you to configure the cluster privileges freely, so as to further ensure data security.

本笔记本展示了如何使用与Baidu Cloud ElasticSearch VectorStore相关的功能。要运行此笔记本，您需要拥有一个正在运行的百度云 ElasticSearch实例：

阅读帮助文档以快速熟悉并配置百度云 ElasticSearch 实例。

实例启动并运行后，请按照以下步骤操作：拆分文档、获取嵌入向量、连接百度云 Elasticsearch 实例、索引文档以及执行向量检索。

我们需要先安装以下 Python 包。

%pip install --upgrade --quiet langchain-community elasticsearch == 7.11.0

首先，我们需要使用 QianfanEmbeddings，因此必须获取千帆的 AK 和 SK。有关千帆的详细信息请参阅百度千帆工作坊

import getpass
import os

if "QIANFAN_AK" not in os.environ:
    os.environ["QIANFAN_AK"] = getpass.getpass("Your Qianfan AK:")
if "QIANFAN_SK" not in os.environ:
    os.environ["QIANFAN_SK"] = getpass.getpass("Your Qianfan SK:")

其次，拆分文档并获取嵌入向量。

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter

loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

from langchain_community.embeddings import QianfanEmbeddingsEndpoint

embeddings = QianfanEmbeddingsEndpoint()

API 参考：TextLoader | CharacterTextSplitter | QianfanEmbeddingsEndpoint

然后，创建一个可访问的百度 Elasticsearch 实例。

# Create a bes instance and index docs.
from langchain_community.vectorstores import BESVectorStore

bes = BESVectorStore.from_documents(
    documents=docs,
    embedding=embeddings,
    bes_url="your bes cluster url",
    index_name="your vector index",
)
bes.client.indices.refresh(index="your vector index")

API 参考：BESVectorStore

最后，查询并检索数据

query = "What did the president say about Ketanji Brown Jackson"
docs = bes.similarity_search(query)
print(docs[0].page_content)

如果您在使用过程中遇到任何问题，请随时联系 liuboyao@baidu.com 或 chenweixu01@baidu.com，我们将竭诚为您提供支持。

向量存储概念指南
向量存储操操作指南

相关​

相关