Skip to main content
Open In Colab在 GitHub 上打开

航海 AI

Voyage AI 提供尖端的嵌入/矢量化模型。

让我们加载 Voyage AI Embedding 类。(使用pip install langchain-voyageai)

from langchain_voyageai import VoyageAIEmbeddings
API 参考:VoyageAIEmbeddings

Voyage AI 利用 API 密钥来监控使用情况和管理权限。要获取您的密钥,请在我们的主页上创建一个帐户。然后,使用您的 API 密钥创建 VoyageEmbeddings 模型。您可以使用以下任何模型:(来源):

  • voyage-3-large
  • voyage-3
  • voyage-3-lite
  • voyage-large-2
  • voyage-code-2
  • voyage-2
  • voyage-law-2
  • voyage-large-2-instruct
  • voyage-finance-2
  • voyage-multilingual-2
embeddings = VoyageAIEmbeddings(
voyage_api_key="[ Your Voyage API key ]", model="voyage-law-2"
)

准备文件并使用embed_documents以获取它们的嵌入。

documents = [
"Caching embeddings enables the storage or temporary caching of embeddings, eliminating the necessity to recompute them each time.",
"An LLMChain is a chain that composes basic LLM functionality. It consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.",
"A Runnable represents a generic unit of work that can be invoked, batched, streamed, and/or transformed.",
]
documents_embds = embeddings.embed_documents(documents)
documents_embds[0][:5]
[0.0562174916267395,
0.018221192061901093,
0.0025736060924828053,
-0.009720131754875183,
0.04108370840549469]

同样,使用embed_query以嵌入查询。

query = "What's an LLMChain?"
query_embd = embeddings.embed_query(query)
query_embd[:5]
[-0.0052348352037370205,
-0.040072452276945114,
0.0033957737032324076,
0.01763271726667881,
-0.019235141575336456]

极简检索系统

嵌入的主要特点是两个嵌入之间的余弦相似性捕获了相应原始段落的语义相关性。这允许我们使用 embedding 进行语义检索/搜索。

我们可以根据余弦相似度在文档嵌入中找到一些最接近的嵌入,并使用KNNRetriever类。

from langchain_community.retrievers import KNNRetriever

retriever = KNNRetriever.from_texts(documents, embeddings)

# retrieve the most relevant documents
result = retriever.invoke(query)
top1_retrieved_doc = result[0].page_content # return the top1 retrieved result

print(top1_retrieved_doc)
API 参考:KNNRetriever
An LLMChain is a chain that composes basic LLM functionality. It consists of a PromptTemplate and a language model (either an LLM or chat model). It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.