阿里云开放搜索(OpenSearch)
阿里云Open搜索 是一站式智能搜索引擎开发平台。
OpenSearch基于阿里集团开发的大规模分布式搜索引擎构建。Alibaba服务于阿里巴巴集团超过500个业务案例以及成千上万的阿里云客户。OpenSearch在不同的搜索场景中帮助开发搜索服务,包括电子商务、O2O、多媒体、内容行业、社区和论坛以及企业的大数据查询。
OpenSearch帮助您开发高质量、维护简便且高性能的智能搜索服务,为您的用户提供高效准确的搜索体验。
OpenSearch提供了向量搜索功能。在特定场景中,特别是在测试题搜索和图像搜索等场景中,您可以将向量搜索功能与多模态搜索功能结合使用以提高搜索结果的准确性。
这个笔记本展示了如何使用与Alibaba Cloud OpenSearch Vector Search Edition相关的功能。
设置
购买实例并进行配置
从阿里云购买OpenSearch向量搜索版,并根据帮助文档进行实例配置。
要运行,您需要有一个OpenSearch向量搜索版实例正在运行。
Alibaba Cloud OpenSearch向量存储类
AlibabaCloudOpenSearch 类支持的功能:
add_textsadd_documentsfrom_textsfrom_documentssimilarity_searchasimilarity_searchsimilarity_search_by_vectorasimilarity_search_by_vectorsimilarity_search_with_relevance_scoresdelete_doc_by_texts
读取帮助文档以快速熟悉并配置OpenSearch向量搜索版实例。
如果您在使用过程中遇到任何问题,请随时联系xingshaomin.xsm@alibaba-inc.com,我们将竭尽全力为您提供帮助和支持。
在实例运行后,请按照以下步骤操作:拆分文档、获取向量嵌入、连接到阿里云OpenSearch实例、索引文档并执行向量检索。
我们首先需要安装以下Python包。
%pip install --upgrade --quiet langchain-community alibabacloud_ha3engine_vector
我们想要使用OpenAIEmbeddings,所以我们必须获取OpenAI API密钥。
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
示例
from langchain_community.vectorstores import (
AlibabaCloudOpenSearch,
AlibabaCloudOpenSearchSettings,
)
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
分割文档并获取嵌入向量。
from langchain_community.document_loaders import TextLoader
loader = TextLoader("../../../state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
创建opensearch设置。
settings = AlibabaCloudOpenSearchSettings(
endpoint=" The endpoint of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
instance_id="The identify of opensearch instance, You can find it from the console of Alibaba Cloud OpenSearch.",
protocol="Communication Protocol between SDK and Server, default is http.",
username="The username specified when purchasing the instance.",
password="The password specified when purchasing the instance.",
namespace="The instance data will be partitioned based on the namespace field. If the namespace is enabled, you need to specify the namespace field name during initialization. Otherwise, the queries cannot be executed correctly.",
tablename="The table name specified during instance configuration.",
embedding_field_separator="Delimiter specified for writing vector field data, default is comma.",
output_fields="Specify the field list returned when invoking OpenSearch, by default it is the value list of the field mapping field.",
field_name_mapping={
"id": "id", # The id field name mapping of index document.
"document": "document", # The text field name mapping of index document.
"embedding": "embedding", # The embedding field name mapping of index document.
"name_of_the_metadata_specified_during_search": "opensearch_metadata_field_name,=",
# The metadata field name mapping of index document, could specify multiple, The value field contains mapping name and operator, the operator would be used when executing metadata filter query,
# Currently supported logical operators are: > (greater than), < (less than), = (equal to), <= (less than or equal to), >= (greater than or equal to), != (not equal to).
# Refer to this link: https://help.aliyun.com/zh/open-search/vector-search-edition/filter-expression
},
)
# for example
# settings = AlibabaCloudOpenSearchSettings(
# endpoint='ha-cn-5yd3fhdm102.public.ha.aliyuncs.com',
# instance_id='ha-cn-5yd3fhdm102',
# username='instance user name',
# password='instance password',
# table_name='test_table',
# field_name_mapping={
# "id": "id",
# "document": "document",
# "embedding": "embedding",
# "string_field": "string_filed,=",
# "int_field": "int_filed,=",
# "float_field": "float_field,=",
# "double_field": "double_field,="
#
# },
# )
通过设置创建一个opensearch访问实例。
# Create an opensearch instance and index docs.
opensearch = AlibabaCloudOpenSearch.from_texts(
texts=docs, embedding=embeddings, config=settings
)
or
# Create an opensearch instance.
opensearch = AlibabaCloudOpenSearch(embedding=embeddings, config=settings)
添加文本并构建索引。
metadatas = [
{"string_field": "value1", "int_field": 1, "float_field": 1.0, "double_field": 2.0},
{"string_field": "value2", "int_field": 2, "float_field": 3.0, "double_field": 4.0},
{"string_field": "value3", "int_field": 3, "float_field": 5.0, "double_field": 6.0},
]
# the key of metadatas must match field_name_mapping in settings.
opensearch.add_texts(texts=docs, ids=[], metadatas=metadatas)
查询并检索数据。
query = "What did the president say about Ketanji Brown Jackson"
docs = opensearch.similarity_search(query)
print(docs[0].page_content)
查询并检索带有元数据的数据。
query = "What did the president say about Ketanji Brown Jackson"
metadata = {
"string_field": "value1",
"int_field": 1,
"float_field": 1.0,
"double_field": 2.0,
}
docs = opensearch.similarity_search(query, filter=metadata)
print(docs[0].page_content)
如果您在使用过程中遇到任何问题,请随时联系xingshaomin.xsm@alibaba-inc.com,我们将竭尽全力为您提供帮助和支持。