从 StuffDocumentsChain 迁移
StuffDocumentsChain 通过将文档连接到单个上下文窗口中来合并文档。这是一种简单而有效的策略,用于组合文档以进行问答、总结和其他目的。
create_stuff_documents_chain 是推荐的替代方案。它的功能与StuffDocumentsChain,更好地支持流式处理和批处理功能。因为它是 LCEL 原语的简单组合,所以它也更容易扩展和合并到其他 LangChain 应用程序中。
下面我们将介绍两者StuffDocumentsChain和create_stuff_documents_chain举一个简单的例子来说明。
让我们首先加载一个聊天模型:
选择聊天模式:
pip install -qU "langchain[openai]"
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")
例
让我们看一个分析一组文档的示例。我们首先生成一些简单的文档以进行说明:
from langchain_core.documents import Document
documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yelow", metadata={"title": "banana_book"}),
]
API 参考:文档
遗产
详
下面我们展示了一个StuffDocumentsChain.我们为一个摘要任务定义了 prompt 模板,并为此实例化了一个 LLMChain 对象。我们定义如何将文档格式化到 Prompt 中,并确保各种 Prompt 中键之间的一致性。
from langchain.chains import LLMChain, StuffDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
# This controls how each document will be formatted. Specifically,
# it will be passed to `format_document` - see that function for more
# details.
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
# The prompt here should take as an input variable the
# `document_variable_name`
prompt = ChatPromptTemplate.from_template("Summarize this content: {context}")
llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = StuffDocumentsChain(
llm_chain=llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
)
我们现在可以调用我们的链:
result = chain.invoke(documents)
result["output_text"]
'This content describes the colors of different fruits: apples are red, blueberries are blue, and bananas are yellow.'
for chunk in chain.stream(documents):
print(chunk)
{'input_documents': [Document(metadata={'title': 'apple_book'}, page_content='Apples are red'), Document(metadata={'title': 'blueberry_book'}, page_content='Blueberries are blue'), Document(metadata={'title': 'banana_book'}, page_content='Bananas are yelow')], 'output_text': 'This content describes the colors of different fruits: apples are red, blueberries are blue, and bananas are yellow.'}
LCEL
详
下面我们展示了一个使用create_stuff_documents_chain:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("Summarize this content: {context}")
chain = create_stuff_documents_chain(llm, prompt)
API 参考:create_stuff_documents_chain | 聊天提示模板
调用链,我们得到与之前类似的结果:
result = chain.invoke({"context": documents})
result
'This content describes the colors of different fruits: apples are red, blueberries are blue, and bananas are yellow.'
请注意,此实现支持输出令牌的流式处理:
for chunk in chain.stream({"context": documents}):
print(chunk, end=" | ")
| This | content | describes | the | colors | of | different | fruits | : | apples | are | red | , | blue | berries | are | blue | , | and | bananas | are | yellow | . | |
后续步骤
查看 LCEL 概念文档 了解更多背景信息。
有关使用 RAG 进行问答任务的更多信息,请参阅这些操作指南。
有关更多基于 LLM 的摘要策略,请参阅本教程。