Skip to main content
Open In ColabOpen on GitHub

构建检索增强生成(RAG)应用:第二部分

在许多问答应用中,我们希望允许用户进行来回对话,这意味着应用程序需要某种“记忆”来保存过去的问题和答案,并具备将这些信息融入当前思考过程的逻辑。

这是多部分教程的第二部分:

  • 第1部分 介绍了RAG,并逐步演示了一个最小实现。
  • 第2部分(本指南)将实现扩展,以支持对话式交互和多步骤检索流程。

在这里,我们专注于添加用于整合历史消息的逻辑。 这涉及到< a t="C0">聊天历史的管理。

我们将介绍两种方法:

  1. Chains,其中我们最多执行一次检索步骤;
  2. 代理,其中我们赋予大型语言模型(LLM)自主权,以执行多个检索步骤。
笔记

此处介绍的方法利用了现代工具调用功能的聊天模型。有关支持工具调用功能的模型列表,请参见此页面

对于外部知识源,我们将使用Lilian Weng撰写的同一篇由LLM驱动的自主代理博客文章,该文章来自RAG教程的第一部分

设置

组件

我们需要从 LangChain 的集成套件中选择三个组件。

pip install -qU "langchain[openai]"
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")
pip install -qU langchain-openai
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
pip install -qU langchain-core
from langchain_core.vectorstores import InMemoryVectorStore

vector_store = InMemoryVectorStore(embeddings)

依赖项

此外,我们还将使用以下软件包:

%%capture --no-stderr
%pip install --upgrade --quiet langgraph langchain-community beautifulsoup4

LangSmith

使用 LangChain 构建的许多应用程序都包含多个步骤,以及多次调用大型语言模型(LLM)。 随着这些应用程序变得越来越复杂,能够检查链或代理内部的具体情况变得至关重要。 实现这一点的最佳方式是使用 LangSmith

请注意,LangSmith 并非必需,但使用它会有所帮助。如果您确实希望使用 LangSmith,请在上方链接注册后,确保设置您的环境变量以开始记录追踪信息:

os.environ["LANGSMITH_TRACING"] = "true"
if not os.environ.get("LANGSMITH_API_KEY"):
os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

链式调用

让我们首先回顾一下在 第1部分 中构建的向量存储,该存储索引了Lilian Weng撰写的关于“由大型语言模型驱动的自主代理”的博客文章。

import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from typing_extensions import List, TypedDict

# Load and chunk contents of the blog
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs=dict(
parse_only=bs4.SoupStrainer(
class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)
# Index chunks
_ = vector_store.add_documents(documents=all_splits)

在RAG教程的第一部分中,我们将用户输入、检索到的上下文以及生成的答案分别表示为状态中的独立键。对话体验可以自然地通过消息序列来表示。除了用户和助手的消息外,还可以通过工具消息将检索到的文档和其他相关成果融入消息序列中。这促使我们使用消息序列来表示我们的RAG应用的状态。具体来说,我们将拥有

  1. 用户输入作为 HumanMessage;
  2. 将向量存储查询作为带有工具调用的 AIMessage
  3. 检索到的文档作为 ToolMessage;
  4. 最终响应为 AIMessage

该状态模型如此灵活,以至于 LangGraph 为方便起见提供了内置版本:

from langgraph.graph import MessagesState, StateGraph

graph_builder = StateGraph(MessagesState)
API 参考:StateGraph

利用 工具调用 与检索步骤进行交互的另一个好处是,检索查询由我们的模型生成。这在对话场景中尤其重要,因为用户查询可能需要根据聊天历史进行上下文化处理。例如,考虑以下对话:

Human: "What is Task Decomposition?"

AI: "Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model."

Human: "What are common ways of doing it?"

在此场景中,模型可能会生成如 "common approaches to task decomposition" 的查询。工具调用可以自然地实现这一点。正如 RAG 教程中的 查询分析 部分所述,这使得模型能够将用户查询重写为更有效的搜索查询。它还支持无需检索步骤的直接响应(例如,对用户的通用问候作出回应)。

让我们将检索步骤转变为一个 工具

from langchain_core.tools import tool


@tool(response_format="content_and_artifact")
def retrieve(query: str):
"""Retrieve information related to a query."""
retrieved_docs = vector_store.similarity_search(query, k=2)
serialized = "\n\n".join(
(f"Source: {doc.metadata}\n" f"Content: {doc.page_content}")
for doc in retrieved_docs
)
return serialized, retrieved_docs
API 参考:工具

查看此指南以了解有关创建工具的更多详细信息。

我们的图将由三个节点组成:

  1. 一个处理用户输入的节点,要么为检索器生成查询,要么直接作出响应;
  2. 用于检索工具的节点,执行检索步骤;
  3. 一个使用检索到的上下文生成最终响应的节点。

我们在下面构建它们。请注意,我们利用了另一个预构建的 LangGraph 组件 ToolNode,它会执行工具并将结果作为 ToolMessage 添加到状态中。

from langchain_core.messages import SystemMessage
from langgraph.prebuilt import ToolNode


# Step 1: Generate an AIMessage that may include a tool-call to be sent.
def query_or_respond(state: MessagesState):
"""Generate tool call for retrieval or respond."""
llm_with_tools = llm.bind_tools([retrieve])
response = llm_with_tools.invoke(state["messages"])
# MessagesState appends messages to state instead of overwriting
return {"messages": [response]}


# Step 2: Execute the retrieval.
tools = ToolNode([retrieve])


# Step 3: Generate a response using the retrieved content.
def generate(state: MessagesState):
"""Generate answer."""
# Get generated ToolMessages
recent_tool_messages = []
for message in reversed(state["messages"]):
if message.type == "tool":
recent_tool_messages.append(message)
else:
break
tool_messages = recent_tool_messages[::-1]

# Format into prompt
docs_content = "\n\n".join(doc.content for doc in tool_messages)
system_message_content = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
f"{docs_content}"
)
conversation_messages = [
message
for message in state["messages"]
if message.type in ("human", "system")
or (message.type == "ai" and not message.tool_calls)
]
prompt = [SystemMessage(system_message_content)] + conversation_messages

# Run
response = llm.invoke(prompt)
return {"messages": [response]}
API 参考:SystemMessage | ToolNode

最后,我们将应用程序编译为一个单一的 graph 对象。在此情况下,我们只是将各个步骤连接成一个序列。我们还允许第一个 query_or_respond 步骤“短路”并直接响应用户,如果它没有生成工具调用的话。这使得我们的应用程序能够支持对话式体验——例如,对通用问候语进行响应,而这些情况可能不需要检索步骤

from langgraph.graph import END
from langgraph.prebuilt import ToolNode, tools_condition

graph_builder.add_node(query_or_respond)
graph_builder.add_node(tools)
graph_builder.add_node(generate)

graph_builder.set_entry_point("query_or_respond")
graph_builder.add_conditional_edges(
"query_or_respond",
tools_condition,
{END: END, "tools": "tools"},
)
graph_builder.add_edge("tools", "generate")
graph_builder.add_edge("generate", END)

graph = graph_builder.compile()
API 参考:ToolNode | tools_condition
from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))

让我们测试一下我们的应用程序。

请注意,它会适当回应那些不需要额外检索步骤的消息:

input_message = "Hello"

for step in graph.stream(
{"messages": [{"role": "user", "content": input_message}]},
stream_mode="values",
):
step["messages"][-1].pretty_print()
================================ Human Message =================================

Hello
================================== Ai Message ==================================

Hello! How can I assist you today?

在执行搜索时,我们可以流式输出各个步骤,以观察查询生成、检索和答案生成的过程:

input_message = "What is Task Decomposition?"

for step in graph.stream(
{"messages": [{"role": "user", "content": input_message}]},
stream_mode="values",
):
step["messages"][-1].pretty_print()
================================ Human Message =================================

What is Task Decomposition?
================================== Ai Message ==================================
Tool Calls:
retrieve (call_dLjB3rkMoxZZxwUGXi33UBeh)
Call ID: call_dLjB3rkMoxZZxwUGXi33UBeh
Args:
query: Task Decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
================================== Ai Message ==================================

Task Decomposition is the process of breaking down a complicated task into smaller, manageable steps. It often involves techniques like Chain of Thought (CoT), which encourages models to think step by step, enhancing performance on complex tasks. This approach allows for a clearer understanding of the task and aids in structuring the problem-solving process.

查看 LangSmith 跟踪 这里

聊天历史的状态管理

笔记

本教程的这一部分之前使用了 RunnableWithMessageHistory 抽象。您可以在 v0.2 文档 中访问该版本的文档。

在 LangChain v0.3 版本发布后,我们建议 LangChain 用户利用 LangGraph 持久化memory 集成到新的 LangChain 应用中。

如果您的代码已经依赖于 RunnableWithMessageHistoryBaseChatMessageHistory,则无需进行任何更改。由于该功能适用于简单的聊天应用,且使用 RunnableWithMessageHistory 的任何代码将继续按预期工作,因此我们计划在近期不弃用此功能。

请参阅 如何迁移到 LangGraph Memory 以获取更多详细信息。

在生产环境中,问答应用通常会将聊天历史记录持久化到数据库中,并能够适当地读取和更新它。

LangGraph 实现了内置的 持久化层,使其非常适合支持多轮对话的聊天应用。

为了管理多个对话轮次和线程,我们只需在编译应用程序时指定一个 检查点。由于图中的节点会向状态追加消息,因此我们将在调用之间保持一致的聊天历史记录。

LangGraph 随附一个简单的内存检查点,我们将在下面使用。有关更多详细信息,请参阅其 文档,包括如何使用不同的持久化后端(例如 SQLite 或 Postgres)。

有关如何管理消息历史记录的详细步骤,请参阅 如何添加消息历史记录(记忆) 指南。

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()
graph = graph_builder.compile(checkpointer=memory)

# Specify an ID for the thread
config = {"configurable": {"thread_id": "abc123"}}
API 参考:MemorySaver

我们现在可以像之前一样调用:

input_message = "What is Task Decomposition?"

for step in graph.stream(
{"messages": [{"role": "user", "content": input_message}]},
stream_mode="values",
config=config,
):
step["messages"][-1].pretty_print()
================================ Human Message =================================

What is Task Decomposition?
================================== Ai Message ==================================
Tool Calls:
retrieve (call_JZb6GLD812bW2mQsJ5EJQDnN)
Call ID: call_JZb6GLD812bW2mQsJ5EJQDnN
Args:
query: Task Decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.
================================== Ai Message ==================================

Task Decomposition is a technique used to break down complicated tasks into smaller, manageable steps. It involves using methods like Chain of Thought (CoT) prompting, which encourages the model to think step by step, enhancing performance on complex tasks. This process helps to clarify the model's reasoning and makes it easier to tackle difficult problems.
input_message = "Can you look up some common ways of doing it?"

for step in graph.stream(
{"messages": [{"role": "user", "content": input_message}]},
stream_mode="values",
config=config,
):
step["messages"][-1].pretty_print()
================================ Human Message =================================

Can you look up some common ways of doing it?
================================== Ai Message ==================================
Tool Calls:
retrieve (call_kjRI4Y5cJOiB73yvd7dmb6ux)
Call ID: call_kjRI4Y5cJOiB73yvd7dmb6ux
Args:
query: common methods of task decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
================================== Ai Message ==================================

Common ways of performing Task Decomposition include: (1) using Large Language Models (LLMs) with simple prompts like "Steps for XYZ" or "What are the subgoals for achieving XYZ?", (2) employing task-specific instructions such as "Write a story outline" for specific tasks, and (3) incorporating human inputs to guide the decomposition process.

请注意,第二个问题中模型生成的查询包含了对话上下文。

LangSmith 跟踪在此特别有信息量,因为我们可以清楚地看到在每一步中我们的聊天模型可以看到哪些消息。

代理

代理 利用大型语言模型(LLM)的推理能力,在执行过程中做出决策。使用代理可以将检索过程中的额外决策权交由模型处理。尽管它们的行为不如上述“链”那样可预测,但能够为查询执行多个检索步骤,或对单个搜索进行迭代。

以下是构建一个最小化RAG代理的方法。使用LangGraph的预构建ReAct代理构造器,我们可以在一行代码中完成。

提示

查看 LangGraph 的 智能体 RAG 教程,了解更高级的实现方法。

from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, [retrieve], checkpointer=memory)
API 参考:create_react_agent

让我们检查一下这个图表:

display(Image(agent_executor.get_graph().draw_mermaid_png()))

与我们早期实现的关键区别在于,这里不再以最终的生成步骤结束运行,而是工具调用会循环返回到最初的LLM调用。模型随后可以使用检索到的上下文回答问题,或生成另一个工具调用以获取更多信息。

让我们来测试一下。我们构造一个通常需要通过一系列迭代的检索步骤才能回答的问题:

config = {"configurable": {"thread_id": "def234"}}

input_message = (
"What is the standard method for Task Decomposition?\n\n"
"Once you get the answer, look up common extensions of that method."
)

for event in agent_executor.stream(
{"messages": [{"role": "user", "content": input_message}]},
stream_mode="values",
config=config,
):
event["messages"][-1].pretty_print()
================================ Human Message =================================

What is the standard method for Task Decomposition?

Once you get the answer, look up common extensions of that method.
================================== Ai Message ==================================
Tool Calls:
retrieve (call_Y3YaIzL71B83Cjqa8d2G0O8N)
Call ID: call_Y3YaIzL71B83Cjqa8d2G0O8N
Args:
query: standard method for Task Decomposition
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
================================== Ai Message ==================================
Tool Calls:
retrieve (call_2JntP1x4XQMWwgVpYurE12ff)
Call ID: call_2JntP1x4XQMWwgVpYurE12ff
Args:
query: common extensions of Task Decomposition methods
================================= Tool Message =================================
Name: retrieve

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Source: {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.
================================== Ai Message ==================================

The standard method for task decomposition involves using techniques such as Chain of Thought (CoT), where a model is instructed to "think step by step" to break down complex tasks into smaller, more manageable components. This approach enhances model performance by allowing for more thorough reasoning and planning. Task decomposition can be accomplished through various means, including:

1. Simple prompting (e.g., asking for steps to achieve a goal).
2. Task-specific instructions (e.g., asking for a story outline).
3. Human inputs to guide the decomposition process.

### Common Extensions of Task Decomposition Methods:

1. **Tree of Thoughts**: This extension builds on CoT by not only decomposing the problem into thought steps but also generating multiple thoughts at each step, creating a tree structure. The search process can employ breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier or through majority voting.

These extensions aim to enhance reasoning capabilities and improve the effectiveness of task decomposition in various contexts.

请注意,该代理:

  1. 生成用于搜索任务分解标准方法的查询;
  2. 接收答案后,生成第二个查询以搜索其常见的扩展内容;
  3. 在获取所有必要上下文后,回答问题。

我们可以在 LangSmith 跟踪 中查看完整的步骤序列,以及延迟和其他元数据。

下一步

我们已经介绍了构建一个基本对话式问答应用的步骤:

  • 我们使用链来构建一个可预测的应用程序,该程序对每个用户输入最多生成一次查询;
  • 我们使用代理来构建一个能够对一系列查询进行迭代的应用程序。

要探索不同类型的检索器和检索策略,请访问指南中的检索器部分。

有关 LangChain 会话记忆抽象的详细指南,请访问 如何添加消息历史记录(记忆) 指南。

了解有关代理的更多信息,请查看 概念指南 和 LangGraph 代理架构 页面。