Faiss

Facebook AI Similarity Search (FAISS) is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also includes supporting code for evaluation and parameter tuning.

See The FAISS Library paper.

你可以在此页面找到 FAISS 文档。

此笔记本展示了如何使用与 FAISS 向量数据库相关的功能。它将展示此集成的特定功能。在阅读后，可能有必要浏览相关用例页面，以了解如何将此向量存储作为更大链的一部分进行使用。

设置

集成位于 langchain-community 包中。我们还需要安装 faiss 包本身。我们可以使用以下命令安装这些包：

请注意，您也可以安装 faiss-gpu 如果您想使用启用 GPU 的版本

pip install -qU langchain-community faiss-cpu

如果您希望获得一流的模型调用自动追踪功能，还可以通过取消注释以下代码来设置您的 LangSmith API 密钥：

# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

初始化

选择嵌入模型：

pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

index = faiss.IndexFlatL2(len(embeddings.embed_query("hello world")))

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

API 参考：InMemoryDocstore | FAISS

管理向量存储

将项目添加到向量存储

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]
uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

API 参考：文档

['22f5ce99-cd6f-4e0c-8dab-664128307c72',
 'dc3f061b-5f88-4fa1-a966-413550c51891',
 'd33d890b-baad-47f7-b7c1-175f5f7b4e59',
 '6e6c01d2-6020-4a7b-95da-ef43d43f01b5',
 'e677223d-ad75-4c1a-bef6-b5912bd1de03',
 '47e2a168-6462-4ed2-b1d9-d9edfd7391d6',
 '1e4d66d6-e155-4891-9212-f7be97f36c6a',
 'c0663096-e1a5-4665-b245-1c2e6c4fb653',
 '8297474a-7f7c-4006-9865-398c1781b1bc',
 '44e4be03-0a8d-4316-b3c4-f35f4bb2b532']

从向量存储中删除项目

vector_store.delete(ids=[uuids[-1]])

True

查询向量存储

一旦您的向量存储已创建并添加了相关文档，您很可能希望在链或代理运行期间对其进行查询。

直接查询

相似性搜索

通过元数据过滤进行简单的相似性搜索可以如下操作：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": "tweet"},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

一些 MongoDB 查询和投影运算符支持更高级的元数据过滤。当前支持的运算符列表如下：

$eq（等于）
$neq（不等于）
$gt（大于）
$lt（小于）
$gte（大于或等于）
$lte（小于或等于）
$in（列表成员资格）
$nin（不在列表中）
$and（所有条件必须匹配）
$or（任何条件必须匹配）
$not（条件的否定）

使用高级元数据过滤执行相同的相似性搜索可以按照以下方式进行：

results = vector_store.similarity_search(
    "LangChain provides abstractions to make working with LLMs easy",
    k=2,
    filter={"source": {"$eq": "tweet"}},
)
for res in results:
    print(f"* {res.page_content} [{res.metadata}]")

* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

带分数的相似性搜索

您也可以按分数搜索：

results = vector_store.similarity_search_with_score(
    "Will it be hot tomorrow?", k=1, filter={"source": "news"}
)
for res, score in results:
    print(f"* [SIM={score:3f}] {res.page_content} [{res.metadata}]")

* [SIM=0.893688] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]

其他搜索方法

搜索 FAISS 向量存储器有多种其他方法。有关这些方法的完整列表，请参阅 API 参考

通过转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在链中更轻松地使用。

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("Stealing from the bank is a crime", filter={"source": "news"})

[Document(metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

检索增强生成的用法

有关如何使用此向量存储进行检索增强生成 (RAG) 的指南，请参阅以下部分：

保存和加载

你也可以保存和加载一个FAISS索引。这很有用，这样你就不用每次使用时都重新创建它了。

vector_store.save_local("faiss_index")

new_vector_store = FAISS.load_local(
    "faiss_index", embeddings, allow_dangerous_deserialization=True
)

docs = new_vector_store.similarity_search("qux")

docs[0]

Document(metadata={'source': 'tweet'}, page_content='Building an exciting new project with LangChain - come check it out!')

合并

你也可以合并两个FAISS向量存储。

db1 = FAISS.from_texts(["foo"], embeddings)
db2 = FAISS.from_texts(["bar"], embeddings)

db1.docstore._dict

{'b752e805-350e-4cf5-ba54-0883d46a3a44': Document(page_content='foo')}

db2.docstore._dict

{'08192d92-746d-4cd1-b681-bdfba411f459': Document(page_content='bar')}

db1.merge_from(db2)

db1.docstore._dict

{'b752e805-350e-4cf5-ba54-0883d46a3a44': Document(page_content='foo'),
 '08192d92-746d-4cd1-b681-bdfba411f459': Document(page_content='bar')}

API 参考

有关所有 FAISS 向量存储功能和配置的详细文档，请访问 API 参考： https://python.langchain.com/api_reference/community/vectorstores/langchain_community.vectorstores.faiss.FAISS.html

向量存储概念指南
向量存储操操作指南

设置​

初始化​

管理向量存储​

将项目添加到向量存储​

从向量存储中删除项目​

查询向量存储​

直接查询

相似性搜索​

带分数的相似性搜索​

其他搜索方法​

通过转换为检索器进行查询​

检索增强生成的用法​

保存和加载​

合并​

API 参考​

相关​

设置

初始化