Skip to main content
在 GitHub 上打开

Retriever

检索器是一个接口,它返回给定非结构化查询的文档。 它比 vector store 更通用。 检索器不需要能够存储文档,只需返回(或检索)它们。 检索器可以从矢量存储创建,但也足够广泛,可以包括 Wikipedia 搜索Amazon Kendra

检索器接受字符串查询作为输入,并返回 Documents 列表作为输出。

有关如何使用检索器的详细信息,请参阅此处的相关操作指南

请注意,所有向量存储都可以强制转换为检索器。 有关可用的矢量存储,请参阅 vector store 集成文档。 本页列出了通过子类化 BaseRetriever 实现的自定义检索器。

自带文档

以下检索器允许您索引和搜索自定义文档语料库。

Retriever自托管云产品
AmazonKnowledgeBasesRetrieverlangchain_aws
AzureAISearchRetrieverlangchain_community
ElasticsearchRetrieverlangchain_elasticsearch
MilvusCollectionHybridSearchRetrieverlangchain_milvus
VertexAISearchRetrieverlangchain_google_community

外部索引

下面的检索器将搜索外部索引(例如,从 Internet 数据或类似数据构建)。

Retriever
ArxivRetrieverScholarly articles on arxiv.orglangchain_community
TavilySearchAPIRetrieverInternet searchlangchain_community
WikipediaRetrieverWikipedia articleslangchain_community

所有Retriever

名字描述
Activeloop Deep MemoryActiveloop Deep Memory is a suite of tools that enables you to optimi...
Amazon KendraAmazon Kendra is an intelligent search service provided by Amazon Web...
ArceeArcee helps with the development of the SLMs—small, specialized, secu...
ArxivarXiv is an open-access archive for 2 million scholarly articles in t...
AskNewsAskNews infuses any LLM with the latest global news (or historical ne...
Azure AI SearchAzure AI Search (formerly known as Azure Cognitive Search) is a Micro...
Bedrock (Knowledge Bases)This guide will help you getting started with the AWS Knowledge Bases...
BM25BM25 (Wikipedia) also known as the Okapi BM25, is a ranking function ...
BoxThis will help you getting started with the Box retriever. For detail...
BREEBS (Open Knowledge)BREEBS is an open collaborative knowledge platform.
ChaindeskChaindesk platform brings data from anywhere (Datsources: Text, PDF, ...
ChatGPT pluginOpenAI plugins connect ChatGPT to third-party applications. These plu...
CogneeThis will help you getting started with the Cognee retriever. For det...
Cohere rerankerCohere is a Canadian startup that provides natural language processin...
Cohere RAGCohere is a Canadian startup that provides natural language processin...
Contextual AI RerankerContextual AI's Instruction-Following Reranker is the world's first r...
DappierDappier connects any LLM or your Agentic AI to real-time, rights-clea...
DocArrayDocArray is a versatile, open-source tool for managing your multi-mod...
DriaDria is a hub of public RAG models for developers to both contribute ...
ElasticSearch BM25Elasticsearch is a distributed, RESTful search and analytics engine. ...
ElasticsearchElasticsearch is a distributed, RESTful search and analytics engine. ...
EmbedchainEmbedchain is a RAG framework to create data pipelines. It loads, ind...
FlashRank rerankerFlashRank is the Ultra-lite & Super-fast Python library to add re-ran...
Fleet AI ContextFleet AI Context is a dataset of high-quality embeddings of the top 1...
GalaxiaGalaxia is GraphRAG solution, which automates document processing, kn...
Google DriveThis notebook covers how to retrieve documents from Google Drive.
Google Vertex AI SearchGoogle Vertex AI Search (formerly known as Enterprise Search on Gener...
Graph RAGGraph traversal over any Vector Store using document metadata.
IBM watsonx.aiWatsonxRerank is a wrapper for IBM watsonx.ai foundation models.
JaguarDB Vector Database[JaguarDB Vector Database](http://www.jaguardb.com/windex.html
Kay.aiKai Data API built for RAG 🕵️ We are curating the world's largest da...
Kinetica Vectorstore based RetrieverKinetica is a database with integrated support for vector similarity ...
kNNIn statistics, the k-nearest neighbours algorithm (k-NN) is a non-par...
LinkupSearchRetrieverLinkup provides an API to connect LLMs to the web and the Linkup Prem...
LLMLingua Document CompressorLLMLingua utilizes a compact, well-trained language model (e.g., GPT2...
LOTR (Merger Retriever)Lord of the Retrievers (LOTR), also known as MergerRetriever, takes a...
MetalMetal is a managed service for ML Embeddings.
Milvus Hybrid SearchMilvus is an open-source vector database built to power embedding sim...
NanoPQ (Product Quantization)Product Quantization algorithm (k-NN) in brief is a quantization algo...
needleNeedle Retriever
NimbleNimbleSearchRetriever enables developers to build RAG applications an...
OutlineOutline is an open-source collaborative knowledge base platform desig...
PermitPermit is an access control platform that provides fine-grained, real...
Pinecone Hybrid SearchPinecone is a vector database with broad functionality.
Pinecone RerankThis notebook shows how to use PineconeRerank for two-stage vector re...
PubMedPubMed® by The National Center for Biotechnology Information, Nationa...
Qdrant Sparse VectorQdrant is an open-source, high-performance vector search engine/datab...
RAGatouilleRAGatouille makes it as simple as can be to use ColBERT!
RePhraseQueryRePhraseQuery is a simple retriever that applies an LLM between the u...
RememberizerRememberizer is a knowledge enhancement service for AI applications c...
SEC filingSEC filing is a financial statement or other formal document submitte...
Self-querying retrievers
SVMSupport vector machines (SVMs) are a set of supervised learning metho...
TavilySearchAPITavily's Search API is a search engine built specifically for AI agen...
TF-IDFTF-IDF means term-frequency times inverse document-frequency.
**NeuralDB**NeuralDB is a CPU-friendly and fine-tunable retrieval engine develope...
ValyuContextValyu allows AI applications and agents to search the internet and pr...
VectorizeThis notebook shows how to use the LangChain Vectorize retriever.
VespaVespa is a fully featured search engine and vector database. It suppo...
WikipediaOverview
You.comyou.com API is a suite of tools designed to help developers ground th...
Zep CloudRetriever Example for Zep Cloud
Zep Open SourceRetriever Example for Zep
Zilliz Cloud PipelineZilliz Cloud Pipelines transform your unstructured data to a searchab...
ZoteroThis will help you getting started with the Zotero retriever. For det...