MariaDB的
LangChain 的 MariaDB 集成 (langchain-mariadb) 提供了用于 MariaDB 版本 11.7.1 及更高版本的矢量功能,这些版本在 MIT 许可证下分发。用户可以按原样使用提供的实施,也可以根据特定需求对其进行自定义。 主要功能包括:
- 内置向量相似性搜索
- 支持余弦和欧几里得距离度量
- 强大的元数据筛选选项
- 通过连接池优化性能
- 可配置的表和列设置
设置
使用以下命令启动 MariaDB Docker 容器:
!docker run --name mariadb-container -e MARIADB_ROOT_PASSWORD=langchain -e MARIADB_DATABASE=langchain -p 3306:3306 -d mariadb:11.7
安装软件包
该包使用 SQLAlchemy,但最适合 MariaDB 连接器,它需要 C/C++ 组件:
# Debian, Ubuntu
!sudo apt install libmariadb3 libmariadb-dev
# CentOS, RHEL, Rocky Linux
!sudo yum install MariaDB-shared MariaDB-devel
# Install Python connector
!pip install -U mariadb
然后安装langchain-mariadb包
pip install -U langchain-mariadb
VectorStore 与 LLM 模型一起工作,这里使用langchain-openai作为示例。
pip install langchain-openai
export OPENAI_API_KEY=...
初始化
from langchain_core.documents import Document
from langchain_mariadb import MariaDBStore
from langchain_openai import OpenAIEmbeddings
# connection string
url = f"mariadb+mariadbconnector://myuser:mypassword@localhost/langchain"
# Initialize vector store
vectorstore = MariaDBStore(
embeddings=OpenAIEmbeddings(),
embedding_length=1536,
datasource=url,
collection_name="my_docs",
)
管理矢量存储
添加数据
您可以将数据添加为包含元数据的文档:
docs = [
Document(
page_content="there are cats in the pond",
metadata={"id": 1, "location": "pond", "topic": "animals"},
),
Document(
page_content="ducks are also found in the pond",
metadata={"id": 2, "location": "pond", "topic": "animals"},
),
# More documents...
]
vectorstore.add_documents(docs)
或者作为带有可选元数据的纯文本:
texts = [
"a sculpture exhibit is also at the museum",
"a new coffee shop opened on Main Street",
]
metadatas = [
{"id": 6, "location": "museum", "topic": "art"},
{"id": 7, "location": "Main Street", "topic": "food"},
]
vectorstore.add_texts(texts=texts, metadatas=metadatas)
查询向量存储
# Basic similarity search
results = vectorstore.similarity_search("Hello", k=2)
# Search with metadata filtering
results = vectorstore.similarity_search("Hello", filter={"category": "greeting"})
过滤器选项
系统支持对元数据进行各种筛选作:
- 相等性:$eq
- 不等式:$ne
- 比较: $lt、$lte、$gt $gte
- 列表作:$in、$nin
- 文本匹配:$like、$nlike
- 逻辑运算:$and、$or $not
例:
# Search with simple filter
results = vectorstore.similarity_search(
"kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
# Search with multiple conditions (AND)
results = vectorstore.similarity_search(
"ducks",
k=10,
filter={"id": {"$in": [1, 5, 2, 9]}, "location": {"$in": ["pond", "market"]}},
)
用于检索增强生成
TODO:文档示例
API 参考
有关更多详细信息,请参阅此处的存储库。