如何让RAG应用程序添加引用

本指南回顾了使模型在生成回应时引用其参考的源文档部分的方法。

我们将介绍五种方法：

使用工具调用引用文档ID；
使用工具调用功能引用文档ID并提供文本片段；
直接提示；
检索后处理（即，压缩检索到的上下文，使其更加相关）；
生成后的后处理（即，发出第二次大语言模型调用，以引用文献的方式标注生成的答案）。

我们通常建议使用列表中适用于您用例的第一个项目。也就是说，如果您的模型支持工具调用，请尝试方法1或2；否则，或在这些方法失败时，再依次尝试后续的方法。

让我们首先创建一个简单的 RAG 链。首先，我们将使用 WikipediaRetriever 从维基百科中检索内容。我们将使用与 LangGraph 实现相同的 RAG 教程。

设置

首先，我们需要安装一些依赖项：

%pip install -qU langchain-community wikipedia

让我们先选择一个大语言模型：

选择聊天模型:

pip install -qU "langchain[openai]"

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")

现在我们可以加载一个检索器并构建我们的提示：

from langchain_community.retrievers import WikipediaRetriever
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You're a helpful AI assistant. Given a user question "
    "and some Wikipedia article snippets, answer the user "
    "question. If none of the articles answer the question, "
    "just say you don't know."
    "\n\nHere are the Wikipedia articles: "
    "{context}"
)

retriever = WikipediaRetriever(top_k_results=6, doc_content_chars_max=2000)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{question}"),
    ]
)
prompt.pretty_print()

API 参考：WikipediaRetriever | ChatPromptTemplate

================================[1m System Message [0m================================

You're a helpful AI assistant. Given a user question and some Wikipedia article snippets, answer the user question. If none of the articles answer the question, just say you don't know.

Here are the Wikipedia articles: [33;1m[1;3m{context}[0m

================================[1m Human Message [0m=================================

[33;1m[1;3m{question}[0m

现在我们已经拥有一个模型、检索器和提示，让我们将它们全部串联起来。按照为 RAG 应用添加引用的操操作指南，我们将让我们的链同时返回答案和检索到的文档。这使用了与 RAG 教程中相同的 LangGraph 实现。

from langchain_core.documents import Document
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict


# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


# Define application steps
def retrieve(state: State):
    retrieved_docs = retriever.invoke(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

API 参考：文档 |StateGraph

from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))

result = graph.invoke({"question": "How fast are cheetahs?"})

sources = [doc.metadata["source"] for doc in result["context"]]
print(f"Sources: {sources}\n\n")
print(f'Answer: {result["answer"]}')

Sources: ['https://en.wikipedia.org/wiki/Cheetah', 'https://en.wikipedia.org/wiki/Southeast_African_cheetah', 'https://en.wikipedia.org/wiki/Footspeed', 'https://en.wikipedia.org/wiki/Fastest_animals', 'https://en.wikipedia.org/wiki/Pursuit_predation', 'https://en.wikipedia.org/wiki/Gepard-class_fast_attack_craft']

Answer: Cheetahs are capable of running at speeds between 93 to 104 km/h (58 to 65 mph).

查看 LangSmith 追踪。

Tool-calling

如果您选择的大型语言模型支持工具调用功能，您可以使用它来让模型在生成回答时指定所参考的提供的文档。LangChain 工具调用模型实现了一个 .with_structured_output 方法，该方法将强制生成内容遵循特定的模式（详情请见此处）。

引用文档

要使用标识符引用文档，我们将标识符格式化到提示中，然后使用 .with_structured_output 强制大语言模型在其输出中引用这些标识符。

首先我们定义输出的模式。 .with_structured_output 支持多种格式，包括 JSON 模式和 Pydantic。这里我们将使用 Pydantic：

from pydantic import BaseModel, Field


class CitedAnswer(BaseModel):
    """Answer the user question based only on the given sources, and cite the sources used."""

    answer: str = Field(
        ...,
        description="The answer to the user question, which is based only on the given sources.",
    )
    citations: List[int] = Field(
        ...,
        description="The integer IDs of the SPECIFIC sources which justify the answer.",
    )

让我们看看当我们传入函数和用户输入时，模型的输出是什么样子：

structured_llm = llm.with_structured_output(CitedAnswer)

example_q = """What Brian's height?

Source: 1
Information: Suzy is 6'2"

Source: 2
Information: Jeremiah is blonde

Source: 3
Information: Brian is 3 inches shorter than Suzy"""
result = structured_llm.invoke(example_q)

result

CitedAnswer(answer='Brian is 5\'11".', citations=[1, 3])

或作为字典：

result.dict()

{'answer': 'Brian is 5\'11".', 'citations': [1, 3]}

现在我们将源标识符结构化到提示中，以在我们的链中进行复制。我们将进行三项更改：

更新提示，以包含源标识符；
使用 structured_llm（即 llm.with_structured_output(CitedAnswer)）；
在输出中返回 Pydantic 对象。

def format_docs_with_id(docs: List[Document]) -> str:
    formatted = [
        f"Source ID: {i}\nArticle Title: {doc.metadata['title']}\nArticle Snippet: {doc.page_content}"
        for i, doc in enumerate(docs)
    ]
    return "\n\n" + "\n\n".join(formatted)


class State(TypedDict):
    question: str
    context: List[Document]
    answer: CitedAnswer


def generate(state: State):
    formatted_docs = format_docs_with_id(state["context"])
    messages = prompt.invoke({"question": state["question"], "context": formatted_docs})
    structured_llm = llm.with_structured_output(CitedAnswer)
    response = structured_llm.invoke(messages)
    return {"answer": response}


graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

result = graph.invoke({"question": "How fast are cheetahs?"})

result["answer"]

CitedAnswer(answer='Cheetahs are capable of running at speeds between 93 to 104 km/h (58 to 65 mph).', citations=[0, 3])

我们可以检查索引为 0 的文档，该文档是模型引用的：

print(result["context"][0])

page_content='The cheetah (Acinonyx jubatus) is a large cat and the fastest land animal. It has a tawny to creamy white or pale buff fur that is marked with evenly spaced, solid black spots. The head is small and rounded, with a short snout and black tear-like facial streaks. It reaches 67–94 cm (26–37 in) at the shoulder, and the head-and-body length is between 1.1 and 1.5 m (3 ft 7 in and 4 ft 11 in). Adults weigh between 21 and 72 kg (46 and 159 lb). The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail.
The cheetah was first described in the late 18th century. Four subspecies are recognised today that are native to Africa and central Iran. An African subspecies was introduced to India in 2022. It is now distributed mainly in small, fragmented populations in northwestern, eastern and southern Africa and central Iran. It lives in a variety of habitats such as savannahs in the Serengeti, arid mountain ranges in the Sahara, and hilly desert terrain.
The cheetah lives in three main social groups: females and their cubs, male "coalitions", and solitary males. While females lead a nomadic life searching for prey in large home ranges, males are more sedentary and instead establish much smaller territories in areas with plentiful prey and access to females. The cheetah is active during the day, with peaks during dawn and dusk. It feeds on small- to medium-sized prey, mostly weighing under 40 kg (88 lb), and prefers medium-sized ungulates such as impala, springbok and Thomson's gazelles. The cheetah typically stalks its prey within 60–100 m (200–330 ft) before charging towards it, trips it during the chase and bites its throat to suffocate it to death. It breeds throughout the year. After a gestation of nearly three months, females give birth to a litter of three or four cubs. Cheetah cubs are highly vulnerable to predation by other large carnivores. They are weaned a' metadata={'title': 'Cheetah', 'summary': 'The cheetah (Acinonyx jubatus) is a large cat and the fastest land animal. It has a tawny to creamy white or pale buff fur that is marked with evenly spaced, solid black spots. The head is small and rounded, with a short snout and black tear-like facial streaks. It reaches 67–94 cm (26–37 in) at the shoulder, and the head-and-body length is between 1.1 and 1.5 m (3 ft 7 in and 4 ft 11 in). Adults weigh between 21 and 72 kg (46 and 159 lb). The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail.\nThe cheetah was first described in the late 18th century. Four subspecies are recognised today that are native to Africa and central Iran. An African subspecies was introduced to India in 2022. It is now distributed mainly in small, fragmented populations in northwestern, eastern and southern Africa and central Iran. It lives in a variety of habitats such as savannahs in the Serengeti, arid mountain ranges in the Sahara, and hilly desert terrain.\nThe cheetah lives in three main social groups: females and their cubs, male "coalitions", and solitary males. While females lead a nomadic life searching for prey in large home ranges, males are more sedentary and instead establish much smaller territories in areas with plentiful prey and access to females. The cheetah is active during the day, with peaks during dawn and dusk. It feeds on small- to medium-sized prey, mostly weighing under 40 kg (88 lb), and prefers medium-sized ungulates such as impala, springbok and Thomson\'s gazelles. The cheetah typically stalks its prey within 60–100 m (200–330 ft) before charging towards it, trips it during the chase and bites its throat to suffocate it to death. It breeds throughout the year. After a gestation of nearly three months, females give birth to a litter of three or four cubs. Cheetah cubs are highly vulnerable to predation by other large carnivores. They are weaned at around four months and are independent by around 20 months of age.\nThe cheetah is threatened by habitat loss, conflict with humans, poaching and high susceptibility to diseases. The global cheetah population was estimated in 2021 at 6,517; it is listed as Vulnerable on the IUCN Red List. It has been widely depicted in art, literature, advertising, and animation. It was tamed in ancient Egypt and trained for hunting ungulates in the Arabian Peninsula and India. It has been kept in zoos since the early 19th century.', 'source': 'https://en.wikipedia.org/wiki/Cheetah'}

LangSmith 跟踪： https://smith.langchain.com/public/6f34d136-451d-4625-90c8-2d8decebc21a/r

引用片段

要返回文本片段（可能还包含源标识符），我们可以采用相同的方法。唯一的区别在于，需要构建一个更复杂的输出模式，这里使用 Pydantic，该模式除了包含源标识符外，还包含一个“引用”字段。

附注：需要注意的是，如果我们把文档拆分成许多仅包含一两句话的小文档，而不是少数几篇长文档，那么引用文档大致就等同于引用片段，对模型来说可能更容易，因为模型只需为每个片段返回一个标识符，而无需返回实际文本。两种方法都值得尝试并进行评估。

class Citation(BaseModel):
    source_id: int = Field(
        ...,
        description="The integer ID of a SPECIFIC source which justifies the answer.",
    )
    quote: str = Field(
        ...,
        description="The VERBATIM quote from the specified source that justifies the answer.",
    )


class QuotedAnswer(BaseModel):
    """Answer the user question based only on the given sources, and cite the sources used."""

    answer: str = Field(
        ...,
        description="The answer to the user question, which is based only on the given sources.",
    )
    citations: List[Citation] = Field(
        ..., description="Citations from the given sources that justify the answer."
    )

class State(TypedDict):
    question: str
    context: List[Document]
    answer: QuotedAnswer


def generate(state: State):
    formatted_docs = format_docs_with_id(state["context"])
    messages = prompt.invoke({"question": state["question"], "context": formatted_docs})
    structured_llm = llm.with_structured_output(QuotedAnswer)
    response = structured_llm.invoke(messages)
    return {"answer": response}


graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

在这里，我们可以看到模型从源0中提取了一个相关的文本片段：

result = graph.invoke({"question": "How fast are cheetahs?"})

result["answer"]

QuotedAnswer(answer='Cheetahs are capable of running at speeds of 93 to 104 km/h (58 to 65 mph).', citations=[Citation(source_id=0, quote='The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed.')])

LangSmith 跟踪： https://smith.langchain.com/public/e16dc72f-4261-4f25-a9a7-906238737283/r

直接提示

某些模型不支持函数调用。我们可以使用直接提示来实现类似的效果。让我们尝试指导一个模型生成结构化的 XML 作为其输出：

xml_system = """You're a helpful AI assistant. Given a user question and some Wikipedia article snippets, \
answer the user question and provide citations. If none of the articles answer the question, just say you don't know.

Remember, you must return both an answer and citations. A citation consists of a VERBATIM quote that \
justifies the answer and the ID of the quote article. Return a citation for every quote across all articles \
that justify the answer. Use the following format for your final output:

<cited_answer>
    <answer></answer>
    <citations>
        <citation><source_id></source_id><quote></quote></citation>
        <citation><source_id></source_id><quote></quote></citation>
        ...
    </citations>
</cited_answer>

Here are the Wikipedia articles:{context}"""
xml_prompt = ChatPromptTemplate.from_messages(
    [("system", xml_system), ("human", "{question}")]
)

我们现在对链进行类似的微小更新：

我们将格式化函数更新为使用XML标签包裹检索到的上下文；
我们不使用 .with_structured_output（例如，因为模型中不存在）；
我们使用 XMLOutputParser 将答案解析为字典。

from langchain_core.output_parsers import XMLOutputParser


def format_docs_xml(docs: List[Document]) -> str:
    formatted = []
    for i, doc in enumerate(docs):
        doc_str = f"""\
    <source id=\"{i}\">
        <title>{doc.metadata['title']}</title>
        <article_snippet>{doc.page_content}</article_snippet>
    </source>"""
        formatted.append(doc_str)
    return "\n\n<sources>" + "\n".join(formatted) + "</sources>"


class State(TypedDict):
    question: str
    context: List[Document]
    answer: dict


def generate(state: State):
    formatted_docs = format_docs_xml(state["context"])
    messages = xml_prompt.invoke(
        {"question": state["question"], "context": formatted_docs}
    )
    response = llm.invoke(messages)
    parsed_response = XMLOutputParser().invoke(response)
    return {"answer": parsed_response}


graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

API 参考：XMLOutputParser

请注意，引用内容再次被结构化地融入答案中：

result = graph.invoke({"question": "How fast are cheetahs?"})

result["answer"]

{'cited_answer': [{'answer': 'Cheetahs can run at speeds of 93 to 104 km/h (58 to 65 mph).'},
  {'citations': [{'citation': [{'source_id': '0'},
      {'quote': 'The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph);'}]},
    {'citation': [{'source_id': '3'},
      {'quote': 'The fastest land animal is the cheetah.'}]}]}]}

LangSmith 跟踪： https://smith.langchain.com/public/0c45f847-c640-4b9a-a5fa-63559e413527/r

检索后处理

另一种方法是后处理我们检索到的文档以压缩内容，使源内容本身已经足够简洁，以至于我们不需要模型引用特定来源或段落。例如，我们可以将每个文档拆分为一两句话，对这些句子进行嵌入，并仅保留最相关的内容。LangChain 提供了一些内置组件来实现这一点。这里我们将使用 RecursiveCharacterTextSplitter，它通过在分隔符子字符串处分割来创建指定大小的块；以及 EmbeddingsFilter，它仅保留嵌入表示最相关的文本。

这种方法有效地将我们的 retrieve 步骤更新为压缩文档。让我们首先选择一个嵌入模型：

选择嵌入模型：

pip install -qU langchain-openai

import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
  os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

我们现在可以重写 retrieve 步骤：

from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain_core.runnables import RunnableParallel
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=0,
    separators=["\n\n", "\n", ".", " "],
    keep_separator=False,
)
compressor = EmbeddingsFilter(embeddings=embeddings, k=10)


class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


def retrieve(state: State):
    retrieved_docs = retriever.invoke(state["question"])
    split_docs = splitter.split_documents(retrieved_docs)
    stateful_docs = compressor.compress_documents(split_docs, state["question"])
    return {"context": stateful_docs}

API 参考：EmbeddingsFilter | RunnableParallel | RecursiveCharacterTextSplitter

让我们来测试一下：

retrieval_result = retrieve({"question": "How fast are cheetahs?"})

for doc in retrieval_result["context"]:
    print(f"{doc.page_content}\n\n")

Adults weigh between 21 and 72 kg (46 and 159 lb). The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail

The cheetah (Acinonyx jubatus) is a large cat and the fastest land animal. It has a tawny to creamy white or pale buff fur that is marked with evenly spaced, solid black spots. The head is small and rounded, with a short snout and black tear-like facial streaks. It reaches 67–94 cm (26–37 in) at the shoulder, and the head-and-body length is between 1.1 and 1.5 m (3 ft 7 in and 4 ft 11 in)

2 mph), or 171 body lengths per second. The cheetah, the fastest land mammal, scores at only 16 body lengths per second

It feeds on small- to medium-sized prey, mostly weighing under 40 kg (88 lb), and prefers medium-sized ungulates such as impala, springbok and Thomson's gazelles. The cheetah typically stalks its prey within 60–100 m (200–330 ft) before charging towards it, trips it during the chase and bites its throat to suffocate it to death. It breeds throughout the year

The cheetah was first described in the late 18th century. Four subspecies are recognised today that are native to Africa and central Iran. An African subspecies was introduced to India in 2022. It is now distributed mainly in small, fragmented populations in northwestern, eastern and southern Africa and central Iran

The cheetah lives in three main social groups: females and their cubs, male "coalitions", and solitary males. While females lead a nomadic life searching for prey in large home ranges, males are more sedentary and instead establish much smaller territories in areas with plentiful prey and access to females. The cheetah is active during the day, with peaks during dawn and dusk

The Southeast African cheetah (Acinonyx jubatus jubatus) is the nominate cheetah subspecies native to East and Southern Africa. The Southern African cheetah lives mainly in the lowland areas and deserts of the Kalahari, the savannahs of Okavango Delta, and the grasslands of the Transvaal region in South Africa. In Namibia, cheetahs are mostly found in farmlands

Subpopulations have been called "South African cheetah" and "Namibian cheetah."

In India, four cheetahs of the subspecies are living in Kuno National Park in Madhya Pradesh after having been introduced there

Acinonyx jubatus velox proposed in 1913 by Edmund Heller on basis of a cheetah that was shot by Kermit Roosevelt in June 1909 in the Kenyan highlands.
Acinonyx rex proposed in 1927 by Reginald Innes Pocock on basis of a specimen from the Umvukwe Range in Rhodesia.

接下来，我们像之前一样将其组装到我们的链中：

# This step is unchanged from our original RAG implementation
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

result = graph.invoke({"question": "How fast are cheetahs?"})

print(result["answer"])

Cheetahs are capable of running at speeds between 93 to 104 km/h (58 to 65 mph). They are known as the fastest land animals.

请注意，文档内容现在已压缩，尽管文档对象在元数据的“summary”键中仍保留原始内容。这些摘要不会传递给模型；只有压缩后的内容会被传递。

result["context"][0].page_content  # passed to model

'Adults weigh between 21 and 72 kg (46 and 159 lb). The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail'

result["context"][0].metadata["summary"]  # original document  # original document

'The cheetah (Acinonyx jubatus) is a large cat and the fastest land animal. It has a tawny to creamy white or pale buff fur that is marked with evenly spaced, solid black spots. The head is small and rounded, with a short snout and black tear-like facial streaks. It reaches 67–94 cm (26–37 in) at the shoulder, and the head-and-body length is between 1.1 and 1.5 m (3 ft 7 in and 4 ft 11 in). Adults weigh between 21 and 72 kg (46 and 159 lb). The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed, including a light build, long thin legs and a long tail.\nThe cheetah was first described in the late 18th century. Four subspecies are recognised today that are native to Africa and central Iran. An African subspecies was introduced to India in 2022. It is now distributed mainly in small, fragmented populations in northwestern, eastern and southern Africa and central Iran. It lives in a variety of habitats such as savannahs in the Serengeti, arid mountain ranges in the Sahara, and hilly desert terrain.\nThe cheetah lives in three main social groups: females and their cubs, male "coalitions", and solitary males. While females lead a nomadic life searching for prey in large home ranges, males are more sedentary and instead establish much smaller territories in areas with plentiful prey and access to females. The cheetah is active during the day, with peaks during dawn and dusk. It feeds on small- to medium-sized prey, mostly weighing under 40 kg (88 lb), and prefers medium-sized ungulates such as impala, springbok and Thomson\'s gazelles. The cheetah typically stalks its prey within 60–100 m (200–330 ft) before charging towards it, trips it during the chase and bites its throat to suffocate it to death. It breeds throughout the year. After a gestation of nearly three months, females give birth to a litter of three or four cubs. Cheetah cubs are highly vulnerable to predation by other large carnivores. They are weaned at around four months and are independent by around 20 months of age.\nThe cheetah is threatened by habitat loss, conflict with humans, poaching and high susceptibility to diseases. The global cheetah population was estimated in 2021 at 6,517; it is listed as Vulnerable on the IUCN Red List. It has been widely depicted in art, literature, advertising, and animation. It was tamed in ancient Egypt and trained for hunting ungulates in the Arabian Peninsula and India. It has been kept in zoos since the early 19th century.'

LangSmith 跟踪： https://smith.langchain.com/public/21b0dc15-d70a-4293-9402-9c70f9178e66/r

生成后处理

另一种方法是对模型生成结果进行后处理。在此示例中，我们首先仅生成一个答案，然后要求模型为自身答案添加引用标注。这种方法的缺点当然是速度较慢且成本更高，因为需要进行两次模型调用。

让我们将此应用到我们最初的链中。如果需要，我们可以通过应用程序的第三步来实现这一点。

class Citation(BaseModel):
    source_id: int = Field(
        ...,
        description="The integer ID of a SPECIFIC source which justifies the answer.",
    )
    quote: str = Field(
        ...,
        description="The VERBATIM quote from the specified source that justifies the answer.",
    )


class AnnotatedAnswer(BaseModel):
    """Annotate the answer to the user question with quote citations that justify the answer."""

    citations: List[Citation] = Field(
        ..., description="Citations from the given sources that justify the answer."
    )


structured_llm = llm.with_structured_output(AnnotatedAnswer)

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str
    annotations: AnnotatedAnswer


def retrieve(state: State):
    retrieved_docs = retriever.invoke(state["question"])
    return {"context": retrieved_docs}


def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


def annotate(state: State):
    formatted_docs = format_docs_with_id(state["context"])
    messages = [
        ("system", system_prompt.format(context=formatted_docs)),
        ("human", state["question"]),
        ("ai", state["answer"]),
        ("human", "Annotate your answer with citations."),
    ]
    response = structured_llm.invoke(messages)
    return {"annotations": response}


graph_builder = StateGraph(State).add_sequence([retrieve, generate, annotate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

display(Image(graph.get_graph().draw_mermaid_png()))

result = graph.invoke({"question": "How fast are cheetahs?"})

print(result["answer"])

Cheetahs are capable of running at speeds between 93 to 104 km/h (58 to 65 mph).

result["annotations"]

AnnotatedAnswer(citations=[Citation(source_id=0, quote='The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph)')])

LangSmith 跟踪： https://smith.langchain.com/public/b8257417-573b-47c4-a750-74e542035f19/r

设置​

Tool-calling​

引用文档​

引用片段​

直接提示​

检索后处理​

生成后处理​

设置