Retriever

检索器是一个接口，它返回给定非结构化查询的文档。它比 vector store 更通用。检索器不需要能够存储文档，只需返回（或检索）它们。检索器可以从矢量存储创建，但也足够广泛，可以包括 Wikipedia 搜索和 Amazon Kendra。

检索器接受字符串查询作为输入，并返回 Documents 列表作为输出。

有关如何使用检索器的详细信息，请参阅此处的相关操作指南。

请注意，所有向量存储都可以强制转换为检索器。有关可用的矢量存储，请参阅 vector store 集成文档。本页列出了通过子类化 BaseRetriever 实现的自定义检索器。

自带文档

以下检索器允许您索引和搜索自定义文档语料库。

Retriever	自托管	云产品	包
AmazonKnowledgeBasesRetriever	❌	✅	langchain_aws
AzureAISearchRetriever	❌	✅	langchain_community
ElasticsearchRetriever	✅	✅	langchain_elasticsearch
MilvusCollectionHybridSearchRetriever	✅	❌	langchain_milvus
VertexAISearchRetriever	❌	✅	langchain_google_community

外部索引

下面的检索器将搜索外部索引（例如，从 Internet 数据或类似数据构建）。

Retriever	源	包
ArxivRetriever	Scholarly articles on arxiv.org	langchain_community
TavilySearchAPIRetriever	Internet search	langchain_community
WikipediaRetriever	Wikipedia articles	langchain_community

所有Retriever

名字	描述
Activeloop Deep Memory	Activeloop Deep Memory is a suite of tools that enables you to optimi...
Amazon Kendra	Amazon Kendra is an intelligent search service provided by Amazon Web...
Arcee	Arcee helps with the development of the SLMs—small, specialized, secu...
Arxiv	arXiv is an open-access archive for 2 million scholarly articles in t...
AskNews	AskNews infuses any LLM with the latest global news (or historical ne...
Azure AI Search	Azure AI Search (formerly known as Azure Cognitive Search) is a Micro...
Bedrock (Knowledge Bases)	This guide will help you getting started with the AWS Knowledge Bases...
BM25	BM25 (Wikipedia) also known as the Okapi BM25, is a ranking function ...
Box	This will help you getting started with the Box retriever. For detail...
BREEBS (Open Knowledge)	BREEBS is an open collaborative knowledge platform.
Chaindesk	Chaindesk platform brings data from anywhere (Datsources: Text, PDF, ...
ChatGPT plugin	OpenAI plugins connect ChatGPT to third-party applications. These plu...
Cognee	This will help you getting started with the Cognee retriever. For det...
Cohere reranker	Cohere is a Canadian startup that provides natural language processin...
Cohere RAG	Cohere is a Canadian startup that provides natural language processin...
Contextual AI Reranker	Contextual AI's Instruction-Following Reranker is the world's first r...
Dappier	Dappier connects any LLM or your Agentic AI to real-time, rights-clea...
DocArray	DocArray is a versatile, open-source tool for managing your multi-mod...
Dria	Dria is a hub of public RAG models for developers to both contribute ...
ElasticSearch BM25	Elasticsearch is a distributed, RESTful search and analytics engine. ...
Elasticsearch	Elasticsearch is a distributed, RESTful search and analytics engine. ...
Embedchain	Embedchain is a RAG framework to create data pipelines. It loads, ind...
FlashRank reranker	FlashRank is the Ultra-lite & Super-fast Python library to add re-ran...
Fleet AI Context	Fleet AI Context is a dataset of high-quality embeddings of the top 1...
Galaxia	Galaxia is GraphRAG solution, which automates document processing, kn...
Google Drive	This notebook covers how to retrieve documents from Google Drive.
Google Vertex AI Search	Google Vertex AI Search (formerly known as Enterprise Search on Gener...
Graph RAG	Graph traversal over any Vector Store using document metadata.
IBM watsonx.ai	WatsonxRerank is a wrapper for IBM watsonx.ai foundation models.
JaguarDB Vector Database	[JaguarDB Vector Database](http://www.jaguardb.com/windex.html
Kay.ai	Kai Data API built for RAG 🕵️ We are curating the world's largest da...
Kinetica Vectorstore based Retriever	Kinetica is a database with integrated support for vector similarity ...
kNN	In statistics, the k-nearest neighbours algorithm (k-NN) is a non-par...
LinkupSearchRetriever	Linkup provides an API to connect LLMs to the web and the Linkup Prem...
LLMLingua Document Compressor	LLMLingua utilizes a compact, well-trained language model (e.g., GPT2...
LOTR (Merger Retriever)	Lord of the Retrievers (LOTR), also known as MergerRetriever, takes a...
Metal	Metal is a managed service for ML Embeddings.
Milvus Hybrid Search	Milvus is an open-source vector database built to power embedding sim...
NanoPQ (Product Quantization)	Product Quantization algorithm (k-NN) in brief is a quantization algo...
needle	Needle Retriever
Nimble	NimbleSearchRetriever enables developers to build RAG applications an...
Outline	Outline is an open-source collaborative knowledge base platform desig...
Permit	Permit is an access control platform that provides fine-grained, real...
Pinecone Hybrid Search	Pinecone is a vector database with broad functionality.
Pinecone Rerank	This notebook shows how to use PineconeRerank for two-stage vector re...
PubMed	PubMed® by The National Center for Biotechnology Information, Nationa...
Qdrant Sparse Vector	Qdrant is an open-source, high-performance vector search engine/datab...
RAGatouille	RAGatouille makes it as simple as can be to use ColBERT!
RePhraseQuery	RePhraseQuery is a simple retriever that applies an LLM between the u...
Rememberizer	Rememberizer is a knowledge enhancement service for AI applications c...
SEC filing	SEC filing is a financial statement or other formal document submitte...
Self-querying retrievers
SVM	Support vector machines (SVMs) are a set of supervised learning metho...
TavilySearchAPI	Tavily's Search API is a search engine built specifically for AI agen...
TF-IDF	TF-IDF means term-frequency times inverse document-frequency.
NeuralDB	NeuralDB is a CPU-friendly and fine-tunable retrieval engine develope...
ValyuContext	Valyu allows AI applications and agents to search the internet and pr...
Vectorize	This notebook shows how to use the LangChain Vectorize retriever.
Vespa	Vespa is a fully featured search engine and vector database. It suppo...
Wikipedia	Overview
You.com	you.com API is a suite of tools designed to help developers ground th...
Zep Cloud	Retriever Example for Zep Cloud
Zep Open Source	Retriever Example for Zep
Zilliz Cloud Pipeline	Zilliz Cloud Pipelines transform your unstructured data to a searchab...
Zotero	This will help you getting started with the Zotero retriever. For det...