Skip to main content
Open In ColabOpen on GitHub

孔径数据库

ApertureDB 是一个数据库,用于存储、索引和管理文本、图像、视频、边界框和嵌入等多模态数据及其相关的元数据。

这个笔记本解释了如何使用ApertureDB的嵌入功能。

安装 ApertureDB Python SDK

这安装了用于编写连接ApertureDB客户端代码的Python SDK

%pip install --upgrade --quiet aperturedb
Note: you may need to restart the kernel to use updated packages.

运行一个ApertureDB实例

要继续,请确保您有ApertureDB 实例已运行和配置您的环境以使用它。
有多种方式可以做到这一点,例如:<br>

docker run --publish 55555:55555 aperturedata/aperturedb-standalone
adb config create local --active --no-interactive

下载一些网络文档

我们在这里将对一个网页进行迷你抓取。

# For loading documents from web
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://docs.aperturedata.io")
docs = loader.load()
API 参考:WebBaseLoader
USER_AGENT environment variable not set, consider setting it to identify your requests.

选择嵌入模型

我们想要使用OllamaEmbeddings,所以我们需要导入必要的模块。

Ollama 可以作为 Docker 容器设置,例如参见 文档

# Run server
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# Tell server to load a specific model
docker exec ollama ollama run llama2
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings()
API 参考:OllamaEmbeddings

将文档拆分为段落

我们希望将单个文档拆分成多个段落。

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

创建向量存储从文档和嵌入

这段代码会在ApertureDB实例中创建一个向量存储。 在实例中,这个向量存储被表示为一个"描述符集"。 默认情况下,描述符集的名称为langchain。以下代码将为每个文档生成嵌入并向ApertureDB中存储这些嵌入作为描述符。这将花费几秒钟时间,因为嵌入正在被生成。

from langchain_community.vectorstores import ApertureDB

vector_db = ApertureDB.from_documents(documents, embeddings)
API 参考:孔径数据库

选择一个大型语言模型

再次,我们使用之前设置的Ollama服务器进行本地处理。

from langchain_community.llms import Ollama

llm = Ollama(model="llama2")
API 参考:Ollama

构建一个RAG链

现在我们有了创建一个检索增强生成(RAG)链所需的所有组件。这个链条执行以下操作:

  1. 生成用户查询的嵌入描述符
  2. 使用向量存储找到与用户查询相似的文本段落
  3. 使用提示模板将用户查询和上下文文档传递给LLM
  4. 返回LLM的答案
# Create prompt
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:

<context>
{context}
</context>

Question: {input}""")


# Create a chain that passes documents to an LLM
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)


# Treat the vectorstore as a document retriever
retriever = vector_db.as_retriever()


# Create a RAG chain that connects the retriever to the LLM
from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)
Based on the provided context, ApertureDB can store images. In fact, it is specifically designed to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. So, ApertureDB has the capability to store and manage images.

运行RAG链路

终于将一个问题传递给链,并得到了答案。由于LLM需要从查询和上下文文档中生成答案,这个过程可能需要几秒钟的时间。

user_query = "How can ApertureDB store images?"
response = retrieval_chain.invoke({"input": user_query})
print(response["answer"])
Based on the provided context, ApertureDB can store images in several ways:

1. Multimodal data management: ApertureDB offers a unified interface to manage multimodal data such as images, videos, documents, embeddings, and associated metadata including annotations. This means that images can be stored along with other types of data in a single database instance.
2. Image storage: ApertureDB provides image storage capabilities through its integration with the public cloud providers or on-premise installations. This allows customers to host their own ApertureDB instances and store images on their preferred cloud provider or on-premise infrastructure.
3. Vector database: ApertureDB also offers a vector database that enables efficient similarity search and classification of images based on their semantic meaning. This can be useful for applications where image search and classification are important, such as in computer vision or machine learning workflows.

Overall, ApertureDB provides flexible and scalable storage options for images, allowing customers to choose the deployment model that best suits their needs.