Skip to main content
Open In ColabOpen on GitHub

MariaDB

LangChain 的 MariaDB 集成(langchain-mariadb)为 MariaDB 11.7.1 及以上版本提供了向量功能,遵循 MIT 许可证发布。用户可以直接使用所提供的实现,或根据特定需求进行自定义。 主要特性包括:

  • 内置向量相似性搜索
  • 支持余弦和欧几里得距离度量
  • 强大的元数据过滤选项
  • 通过连接池进行性能优化
  • 可配置的表格和列设置

设置

使用以下命令启动一个 MariaDB Docker 容器:

!docker run --name mariadb-container -e MARIADB_ROOT_PASSWORD=langchain -e MARIADB_DATABASE=langchain -p 3306:3306 -d mariadb:11.7

安装包

该包使用 SQLAlchemy,但与 MariaDB 连接器配合使用效果最佳,而 MariaDB 连接器需要 C/C++ 组件:

# Debian, Ubuntu
!sudo apt install libmariadb3 libmariadb-dev

# CentOS, RHEL, Rocky Linux
!sudo yum install MariaDB-shared MariaDB-devel

# Install Python connector
!pip install -U mariadb

然后安装 langchain-mariadb

pip install -U langchain-mariadb

向量存储(VectorStore)与一个LLM模型配合使用,此处以langchain-openai为例。

pip install langchain-openai
export OPENAI_API_KEY=...

初始化

from langchain_core.documents import Document
from langchain_mariadb import MariaDBStore
from langchain_openai import OpenAIEmbeddings

# connection string
url = f"mariadb+mariadbconnector://myuser:mypassword@localhost/langchain"

# Initialize vector store
vectorstore = MariaDBStore(
embeddings=OpenAIEmbeddings(),
embedding_length=1536,
datasource=url,
collection_name="my_docs",
)
API 参考:文档 |OpenAI 嵌入

管理向量存储

添加数据

您可以将带有元数据的数据作为文档添加:

docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
# More documents...
]
vectorstore.add_documents(docs)

或者作为带有可选元数据的纯文本:

texts = [
"a sculpture exhibit is also at the museum",
"a new coffee shop opened on Main Street",
]
metadatas = [
{"id": 6, "location": "museum", "topic": "art"},
{"id": 7, "location": "Main Street", "topic": "food"},
]

vectorstore.add_texts(texts=texts, metadatas=metadatas)

查询向量存储

# Basic similarity search
results = vectorstore.similarity_search("Hello", k=2)

# Search with metadata filtering
results = vectorstore.similarity_search("Hello", filter={"category": "greeting"})

过滤选项

系统支持对元数据的各种过滤操作:

  • 相等:$eq
  • 不等式:$ne
  • 比较操作符:$lt, $lte, $gt, $gte
  • 列表操作:$in,$nin
  • 文本匹配:$like,$nlike
  • 逻辑操作:$and,$or,$not

示例:

# Search with simple filter
results = vectorstore.similarity_search(
"kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)

# Search with multiple conditions (AND)
results = vectorstore.similarity_search(
"ducks",
k=10,
filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)

使用检索增强生成

TODO: 记录示例

API 参考

有关更多详细信息,请参见仓库 此处