Skip to main content
Open on GitHub

检索器

一个检索器是一个接口,可以根据非结构化的查询返回文档。它比向量存储更为通用。 检索器不需要能够存储文档,只需能够返回(或检索)它们即可。 检索器可以从向量存储创建,但也足够广泛以包括维基百科搜索Amazon Kendra

检索器接受一个字符串查询作为输入,并返回一个包含文档的列表作为输出。

对于如何使用检索器的具体说明,请参见这里的相关指南

请注意,所有的向量存储都可以被转换为检索器。 参阅向量存储集成文档以了解可用的向量存储。 本页面列出了通过继承BaseRetriever实现的自定义检索器。

带您自己的文档

以下检索器允许您索引和搜索自定义文档集。

检索器Self-host云开发解决方案
AmazonKnowledgeBasesRetrieverlangchain_aws
AzureAISearchRetrieverlangchain_community
ElasticsearchRetrieverlangchain_elasticsearch
MilvusCollectionHybridSearchRetrieverlangchain_milvus
VertexAISearchRetrieverlangchain_google_community

外部索引

以下检索器将在外部索引(例如,从互联网数据构建)中进行搜索。

检索器来源
ArxivRetrieverScholarly articles on arxiv.orglangchain_community
TavilySearchAPIRetrieverInternet searchlangchain_community
WikipediaRetrieverWikipedia articleslangchain_community

所有检索器

名称描述
Activeloop Deep MemoryActiveloop Deep Memory is a suite of tools that enables you to optimi...
Amazon KendraAmazon Kendra is an intelligent search service provided by Amazon Web...
ArceeArcee helps with the development of the SLMs—small, specialized, secu...
ArxivarXiv is an open-access archive for 2 million scholarly articles in t...
AskNewsAskNews infuses any LLM with the latest global news (or historical ne...
Azure AI SearchAzure AI Search (formerly known as Azure Cognitive Search) is a Micro...
Bedrock (Knowledge Bases)This guide will help you getting started with the AWS Knowledge Bases...
BM25BM25 (Wikipedia) also known as the Okapi BM25, is a ranking function ...
BoxThis will help you getting started with the Box retriever. For detail...
BREEBS (Open Knowledge)BREEBS is an open collaborative knowledge platform.
ChaindeskChaindesk platform brings data from anywhere (Datsources: Text, PDF, ...
ChatGPT pluginOpenAI plugins connect ChatGPT to third-party applications. These plu...
CogneeThis will help you getting started with the Cognee retriever. For det...
Cohere rerankerCohere is a Canadian startup that provides natural language processin...
Cohere RAGCohere is a Canadian startup that provides natural language processin...
Contextual AI RerankerContextual AI's Instruction-Following Reranker is the world's first r...
DappierDappier connects any LLM or your Agentic AI to real-time, rights-clea...
DocArrayDocArray is a versatile, open-source tool for managing your multi-mod...
DriaDria is a hub of public RAG models for developers to both contribute ...
ElasticSearch BM25Elasticsearch is a distributed, RESTful search and analytics engine. ...
ElasticsearchElasticsearch is a distributed, RESTful search and analytics engine. ...
EmbedchainEmbedchain is a RAG framework to create data pipelines. It loads, ind...
FlashRank rerankerFlashRank is the Ultra-lite & Super-fast Python library to add re-ran...
Fleet AI ContextFleet AI Context is a dataset of high-quality embeddings of the top 1...
GalaxiaGalaxia is GraphRAG solution, which automates document processing, kn...
Google DriveThis notebook covers how to retrieve documents from Google Drive.
Google Vertex AI SearchGoogle Vertex AI Search (formerly known as Enterprise Search on Gener...
Graph RAGGraph traversal over any Vector Store using document metadata.
IBM watsonx.aiWatsonxRerank is a wrapper for IBM watsonx.ai foundation models.
JaguarDB Vector Database[JaguarDB Vector Database](http://www.jaguardb.com/windex.html
Kay.aiKai Data API built for RAG 🕵️ We are curating the world's largest da...
Kinetica Vectorstore based RetrieverKinetica is a database with integrated support for vector similarity ...
kNNIn statistics, the k-nearest neighbours algorithm (k-NN) is a non-par...
LinkupSearchRetrieverLinkup provides an API to connect LLMs to the web and the Linkup Prem...
LLMLingua Document CompressorLLMLingua utilizes a compact, well-trained language model (e.g., GPT2...
LOTR (Merger Retriever)Lord of the Retrievers (LOTR), also known as MergerRetriever, takes a...
MetalMetal is a managed service for ML Embeddings.
Milvus Hybrid SearchMilvus is an open-source vector database built to power embedding sim...
NanoPQ (Product Quantization)Product Quantization algorithm (k-NN) in brief is a quantization algo...
needleNeedle Retriever
NimbleNimbleSearchRetriever enables developers to build RAG applications an...
OutlineOutline is an open-source collaborative knowledge base platform desig...
PermitPermit is an access control platform that provides fine-grained, real...
Pinecone Hybrid SearchPinecone is a vector database with broad functionality.
Pinecone RerankThis notebook shows how to use PineconeRerank for two-stage vector re...
PubMedPubMed® by The National Center for Biotechnology Information, Nationa...
Qdrant Sparse VectorQdrant is an open-source, high-performance vector search engine/datab...
RAGatouilleRAGatouille makes it as simple as can be to use ColBERT!
RePhraseQueryRePhraseQuery is a simple retriever that applies an LLM between the u...
RememberizerRememberizer is a knowledge enhancement service for AI applications c...
SEC filingSEC filing is a financial statement or other formal document submitte...
Self-querying retrievers
SVMSupport vector machines (SVMs) are a set of supervised learning metho...
TavilySearchAPITavily's Search API is a search engine built specifically for AI agen...
TF-IDFTF-IDF means term-frequency times inverse document-frequency.
**NeuralDB**NeuralDB is a CPU-friendly and fine-tunable retrieval engine develope...
ValyuContextValyu allows AI applications and agents to search the internet and pr...
VectorizeThis notebook shows how to use the LangChain Vectorize retriever.
VespaVespa is a fully featured search engine and vector database. It suppo...
WikipediaOverview
You.comyou.com API is a suite of tools designed to help developers ground th...
Zep CloudRetriever Example for Zep Cloud
Zep Open SourceRetriever Example for Zep
Zilliz Cloud PipelineZilliz Cloud Pipelines transform your unstructured data to a searchab...
ZoteroThis will help you getting started with the Zotero retriever. For det...