Skip to main content
Open In ColabOpen on GitHub

Chroma

这个笔记本介绍了如何使用Chroma向量存储。

Chroma 是一个专注于开发者生产力和幸福感的 AI 原生开源向量数据库。Chroma 使用 Apache 2.0 协议授权。您可以在 该页面 查看 Chroma 的完整文档,同时在 该页面 找到 LangChain 集成的 API 参考。

设置

要访问Chroma向量存储,您需要安装langchain-chroma集成包。

pip install -qU "langchain-chroma>=0.1.2"

Credentials

您可以无需任何凭据使用Chroma向量存储,只需安装上方的包即可!

如果您想要获得最佳的模型调用自动化跟踪,您也可以通过取消注释下方代码来设置您的LangSmith API密钥。

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

初始化

基本初始化

以下是基本初始化,包括使用目录将数据本地保存。

pip install -qU langchain-openai
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_chroma import Chroma

vector_store = Chroma(
collection_name="example_collection",
embedding_function=embeddings,
persist_directory="./chroma_langchain_db", # Where to save data locally, remove if not necessary
)

初始化从客户端

您也可以从Chroma客户端初始化,这在您希望更方便地访问底层数据库时特别有用。

import chromadb

persistent_client = chromadb.PersistentClient()
collection = persistent_client.get_or_create_collection("collection_name")
collection.add(ids=["1", "2", "3"], documents=["a", "b", "c"])

vector_store_from_client = Chroma(
client=persistent_client,
collection_name="collection_name",
embedding_function=embeddings,
)

管理向量存储

创建向量存储后,我们可以对其进行交互,通过添加和删除不同的项。

添加项到向量存储

我们可以通过使用add_documents函数来向我们的向量存储中添加项目。

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source": "tweet"},
id=1,
)

document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source": "news"},
id=2,
)

document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source": "tweet"},
id=3,
)

document_4 = Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source": "news"},
id=4,
)

document_5 = Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source": "tweet"},
id=5,
)

document_6 = Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source": "website"},
id=6,
)

document_7 = Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source": "website"},
id=7,
)

document_8 = Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source": "tweet"},
id=8,
)

document_9 = Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source": "news"},
id=9,
)

document_10 = Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source": "tweet"},
id=10,
)

documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)
API 参考:文档
['f22ed484-6db3-4b76-adb1-18a777426cd6',
'e0d5bab4-6453-4511-9a37-023d9d288faa',
'877d76b8-3580-4d9e-a13f-eed0fa3d134a',
'26eaccab-81ce-4c0a-8e76-bf542647df18',
'bcaa8239-7986-4050-bf40-e14fb7dab997',
'cdc44b38-a83f-4e49-b249-7765b334e09d',
'a7a35354-2687-4bc2-8242-3849a4d18d34',
'8780caf1-d946-4f27-a707-67d037e9e1d8',
'dec6af2a-7326-408f-893d-7d7d717dfda9',
'3b18e210-bb59-47a0-8e17-c8e51176ea5e']

更新向量存储中的项

现在我们已经将文档添加到了向量存储中,可以通过使用update_documents函数来更新现有的文档。

updated_document_1 = Document(
page_content="I had chocolate chip pancakes and fried eggs for breakfast this morning.",
metadata={"source": "tweet"},
id=1,
)

updated_document_2 = Document(
page_content="The weather forecast for tomorrow is sunny and warm, with a high of 82 degrees.",
metadata={"source": "news"},
id=2,
)

vector_store.update_document(document_id=uuids[0], document=updated_document_1)
# You can also update multiple documents at once
vector_store.update_documents(
ids=uuids[:2], documents=[updated_document_1, updated_document_2]
)

删除向量存储中的项

我们还可以按照以下方式从向量存储中删除项目:

vector_store.delete(ids=uuids[-1])

查询向量存储

一旦您的向量存储已经创建并添加了相关文档,您很可能在运行链或代理的过程中希望对其进行查询。

查询直接

简单进行相似性搜索可以按照以下方式进行:

results = vector_store.similarity_search(
"LangChain provides abstractions to make working with LLMs easy",
k=2,
filter={"source": "tweet"},
)
for res in results:
print(f"* {res.page_content} [{res.metadata}]")
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

相似性搜索(带分数)

如果您想要执行相似性搜索并接收相应的评分,可以运行:

results = vector_store.similarity_search_with_score(
"Will it be hot tomorrow?", k=1, filter={"source": "news"}
)
for res, score in results:
print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")
* [SIM=1.726390] The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}]

搜索向量

您也可以通过向量进行搜索:

results = vector_store.similarity_search_by_vector(
embedding=embeddings.embed_query("I love green eggs and ham!"), k=1
)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")
* I had chocolate chip pancakes and fried eggs for breakfast this morning. [{'source': 'tweet'}]

其他搜索方法

在本笔记本中没有涵盖其他各种搜索方法,例如MMR搜索或向量搜索。要查看AstraDBVectorStore可用的完整搜索功能列表,请参阅API参考

查询通过转换为检索器

您还可以将向量存储转换为检索器,以便在链中更轻松地使用。有关不同搜索类型和可以传递的kwargs的更多信息,请访问API参考文档这里

retriever = vector_store.as_retriever(
search_type="mmr", search_kwargs={"k": 1, "fetch_k": 5}
)
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})
[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

使用检索增强生成

对于如何使用此向量存储进行检索增强生成(RAG)的指南,请参见以下部分:

API 参考

详细文档请参阅所有Chroma向量存储功能和配置: https://python.langchain.com/api_reference/chroma/vectorstores/langchain_chroma.vectorstores.Chroma.html