Skip to main content
Open In Colab在 GitHub 上打开

如何为聊天机器人添加内存

聊天机器人的一个关键特征是它们能够将之前对话轮次的内容用作上下文。这种状态管理可以采用多种形式,包括:

  • 只需将以前的消息塞入聊天模型提示中即可。
  • 以上,但修剪旧消息以减少模型必须处理的分散注意力的信息量。
  • 更复杂的修改,例如为长时间运行的对话合成摘要。

我们将在下面更详细地介绍一些技术!

注意

本操作指南之前使用 RunnableWithMessageHistory 构建了一个聊天机器人。您可以在 v0.2 文档中访问此版本的指南。

从 LangChain v0.3 版本开始,我们建议 LangChain 用户利用 LangGraph 持久化来整合memory到新的 LangChain 应用程序中。

如果您的代码已经依赖于RunnableWithMessageHistoryBaseChatMessageHistory,则无需进行任何更改。我们不打算在不久的将来弃用此功能,因为它适用于简单的聊天应用程序和任何使用RunnableWithMessageHistory将继续按预期工作。

有关更多详细信息,请参阅如何迁移到 LangGraph 内存

设置

您需要安装一些软件包,并将您的 OpenAI API 密钥设置为名为OPENAI_API_KEY:

%pip install --upgrade --quiet langchain langchain-openai langgraph

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
OpenAI API Key: ········

我们还设置一个聊天模型,我们将用于以下示例。

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o-mini")
API 参考:ChatOpenAI

消息传递

最简单的内存形式就是将聊天记录消息传递到一个链中。下面是一个示例:

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
[
SystemMessage(
content="You are a helpful assistant. Answer all questions to the best of your ability."
),
MessagesPlaceholder(variable_name="messages"),
]
)

chain = prompt | model

ai_msg = chain.invoke(
{
"messages": [
HumanMessage(
content="Translate from English to French: I love programming."
),
AIMessage(content="J'adore la programmation."),
HumanMessage(content="What did you just say?"),
],
}
)
print(ai_msg.content)
I said, "I love programming" in French: "J'adore la programmation."

我们可以看到,通过将之前的对话传递到一个链中,它可以将其用作上下文来回答问题。这是支撑聊天机器人内存的基本概念 - 本指南的其余部分将演示传递或重新格式化消息的便捷技术。

自动历史记录管理

前面的示例显式地将消息传递给链(和模型)。这是一种完全可以接受的方法,但它确实需要对新消息进行外部管理。LangChain 还提供了一种使用 LangGraph 的持久化构建具有内存的应用程序的方法。您可以通过提供checkpointer在编译图形时。

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + state["messages"]
response = model.invoke(messages)
return {"messages": response}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
API 参考:MemorySaver | StateGraph

我们将在此处将最新的输入传递给对话,并让 LangGraph 使用 checkpointer 跟踪对话历史:

app.invoke(
{"messages": [HumanMessage(content="Translate to French: I love programming.")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39})]}
app.invoke(
{"messages": [HumanMessage(content="What did I just ask you?")]},
config={"configurable": {"thread_id": "1"}},
)
{'messages': [HumanMessage(content='Translate to French: I love programming.', additional_kwargs={}, response_metadata={}, id='be5e7099-3149-4293-af49-6b36c8ccd71b'),
AIMessage(content="J'aime programmer.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 35, 'total_tokens': 39, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_e9627b5346', 'finish_reason': 'stop', 'logprobs': None}, id='run-8a753d7a-b97b-4d01-a661-626be6f41b38-0', usage_metadata={'input_tokens': 35, 'output_tokens': 4, 'total_tokens': 39}),
HumanMessage(content='What did I just ask you?', additional_kwargs={}, response_metadata={}, id='c667529b-7c41-4cc0-9326-0af47328b816'),
AIMessage(content='You asked me to translate "I love programming" into French.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 54, 'total_tokens': 67, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-134a7ea0-d3a4-4923-bd58-25e5a43f6a1f-0', usage_metadata={'input_tokens': 54, 'output_tokens': 13, 'total_tokens': 67})]}

修改聊天记录

修改存储的聊天消息可以帮助您的聊天机器人处理各种情况。以下是一些示例:

剪裁消息

LLM 和聊天模型的上下文窗口有限,即使您没有直接达到限制,您也可能希望限制模型必须处理的干扰量。一种解决方案是在将历史消息传递给模型之前对其进行修剪。让我们使用一个示例 history 和app我们在上面声明:

demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]

app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What's my name?")]
},
config={"configurable": {"thread_id": "2"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content="What's my name?", additional_kwargs={}, response_metadata={}, id='c933eca3-5fd8-4651-af16-20fe2d49c216'),
AIMessage(content='Your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 5, 'prompt_tokens': 63, 'total_tokens': 68, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-a0b21acc-9dbb-4fb6-a953-392020f37d88-0', usage_metadata={'input_tokens': 63, 'output_tokens': 5, 'total_tokens': 68})]}

我们可以看到应用程序记住了预加载的名称。

但是,假设我们有一个非常小的上下文窗口,并且我们想将传递给模型的消息数量限制为仅 2 个最近的消息。我们可以使用内置的 trim_messages util 在消息到达我们的提示符之前根据消息的令牌计数来修剪消息。在这种情况下,我们将每条消息计为 1 个 “token” 并仅保留最后两条消息:

from langchain_core.messages import trim_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# Define trimmer
# count each message as 1 "token" (token_counter=len) and keep only the last two messages
trimmer = trim_messages(strategy="last", max_tokens=2, token_counter=len)

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
trimmed_messages = trimmer.invoke(state["messages"])
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability."
)
messages = [SystemMessage(content=system_prompt)] + trimmed_messages
response = model.invoke(messages)
return {"messages": response}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

让我们调用这个新应用程序并检查响应

app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage(content="What is my name?")]
},
config={"configurable": {"thread_id": "3"}},
)
{'messages': [HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}, id='6b4cab70-ce18-49b0-bb06-267bde44e037'),
AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}, id='ba3714f4-8876-440b-a651-efdcab2fcb4c'),
HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}, id='08d032c0-1577-4862-a3f2-5c1b90687e21'),
AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={}, id='21790e16-db05-4537-9a6b-ecad0fcec436'),
HumanMessage(content='What is my name?', additional_kwargs={}, response_metadata={}, id='a22ab7c5-8617-4821-b3e9-a9e7dca1ff78'),
AIMessage(content="I'm sorry, but I don't have access to personal information about you unless you share it with me. How can I assist you today?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 39, 'total_tokens': 66, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-f7b32d72-9f57-4705-be7e-43bf1c3d293b-0', usage_metadata={'input_tokens': 39, 'output_tokens': 27, 'total_tokens': 66})]}

我们可以看到trim_messages被调用,并且只有最近的两条消息将传递给模型。在这种情况下,这意味着模型忘记了我们给它起的名字。

查看我们的 如何修剪消息 了解更多信息。

摘要内存

我们也可以以其他方式使用相同的模式。例如,在调用我们的应用程序之前,我们可以使用额外的 LLM 调用来生成对话摘要。让我们重新创建我们的聊天记录:

demo_ephemeral_chat_history = [
HumanMessage(content="Hey there! I'm Nemo."),
AIMessage(content="Hello!"),
HumanMessage(content="How are you today?"),
AIMessage(content="Fine thanks!"),
]

现在,让我们更新模型调用函数,将之前的交互提炼成一个摘要:

from langchain_core.messages import HumanMessage, RemoveMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
system_prompt = (
"You are a helpful assistant. "
"Answer all questions to the best of your ability. "
"The provided chat history includes a summary of the earlier conversation."
)
system_message = SystemMessage(content=system_prompt)
message_history = state["messages"][:-1] # exclude the most recent user input
# Summarize the messages if the chat history reaches a certain size
if len(message_history) >= 4:
last_human_message = state["messages"][-1]
# Invoke the model to generate conversation summary
summary_prompt = (
"Distill the above chat messages into a single summary message. "
"Include as many specific details as you can."
)
summary_message = model.invoke(
message_history + [HumanMessage(content=summary_prompt)]
)

# Delete messages that we no longer want to show up
delete_messages = [RemoveMessage(id=m.id) for m in state["messages"]]
# Re-add user message
human_message = HumanMessage(content=last_human_message.content)
# Call the model with summary & response
response = model.invoke([system_message, summary_message, human_message])
message_updates = [summary_message, human_message, response] + delete_messages
else:
message_updates = model.invoke([system_message] + state["messages"])

return {"messages": message_updates}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

让我们看看它是否记得我们给它起的名字:

app.invoke(
{
"messages": demo_ephemeral_chat_history
+ [HumanMessage("What did I say my name was?")]
},
config={"configurable": {"thread_id": "4"}},
)
{'messages': [AIMessage(content="Nemo greeted me, and I responded positively, indicating that I'm doing well.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 60, 'total_tokens': 76, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-ee42f98d-907d-4bad-8f16-af2db789701d-0', usage_metadata={'input_tokens': 60, 'output_tokens': 16, 'total_tokens': 76}),
HumanMessage(content='What did I say my name was?', additional_kwargs={}, response_metadata={}, id='788555ea-5b1f-4c29-a2f2-a92f15d147be'),
AIMessage(content='You mentioned that your name is Nemo.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 67, 'total_tokens': 75, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_1bb46167f9', 'finish_reason': 'stop', 'logprobs': None}, id='run-099a43bd-a284-4969-bb6f-0be486614cd8-0', usage_metadata={'input_tokens': 67, 'output_tokens': 8, 'total_tokens': 75})]}

请注意,再次调用应用程序将继续累积历史记录,直到达到指定的消息数量(在本例中为 4 条)。此时,我们将生成从初始摘要和新消息生成的另一个摘要,依此类推。