内存图

Memgraph 是一个开源图形数据库，针对动态分析环境进行了优化，并与 Neo4j 兼容。为了查询数据库，Memgraph 使用 Cypher，这是属性图数据库采用最广泛、完全指定和开放的查询语言。

此笔记本将向您展示如何使用自然语言查询 Memgraph，以及如何从非结构化数据构建知识图谱。

但首先，请确保设置好所有内容。

建立

要阅读本指南，您需要安装 Docker 和 Python 3.x。

首次快速运行 Memgraph Platform（Memgraph 数据库 + MAGE 库 + Memgraph Lab），请执行以下作：

在 Linux/MacOS 上：

curl https://install.memgraph.com | sh

在 Windows 上：

iwr https://windows.memgraph.com | iex

这两个命令都运行一个脚本，该脚本将 Docker Compose 文件下载到您的系统，构建并启动memgraph-mage和memgraph-labDocker 服务位于两个单独的容器中。现在，Memgraph 已启动并运行！在 Memgraph 文档中阅读有关安装过程的更多信息。

要使用 LangChain，请安装并导入所有必要的包。我们将使用包管理器 pip 以及--user标志，以确保适当的权限。如果您已安装 Python 3.4 或更高版本，pip默认包含。您可以使用以下命令安装所有必需的软件包：

pip install langchain langchain-openai langchain-memgraph --user

您可以在此笔记本中运行提供的代码块，也可以使用单独的 Python 文件来试验 Memgraph 和 LangChain。

自然语言查询

Memgraph 与 LangChain 的集成包括自然语言查询。要使用它，首先要进行所有必要的导入。我们将按照它们在代码中的显示方式进行讨论。

首先，实例化MemgraphGraph.此对象保存与正在运行的 Memgraph 实例的连接。确保正确设置所有环境变量。

import os

from langchain_core.prompts import PromptTemplate
from langchain_memgraph.chains.graph_qa import MemgraphQAChain
from langchain_memgraph.graphs.memgraph import Memgraph
from langchain_openai import ChatOpenAI

url = os.environ.get("MEMGRAPH_URI", "bolt://localhost:7687")
username = os.environ.get("MEMGRAPH_USERNAME", "")
password = os.environ.get("MEMGRAPH_PASSWORD", "")

graph = Memgraph(url=url, username=username, password=password, refresh_schema=False)

API 参考：PromptTemplate | 聊天OpenAI

这refresh_schema最初设置为False因为数据库中仍然没有数据，我们希望避免不必要的数据库调用。

填充数据库

要填充数据库，请首先确保它为空。最有效的方法是切换到内存中分析存储模式，删除图形并返回到内存中事务模式。了解有关 Memgraph 存储模式的更多信息。

我们将添加到数据库中的数据是关于各种平台上可用的不同类型的视频游戏，并与发行商有关。

# Drop graph
graph.query("STORAGE MODE IN_MEMORY_ANALYTICAL")
graph.query("DROP GRAPH")
graph.query("STORAGE MODE IN_MEMORY_TRANSACTIONAL")

# Creating and executing the seeding query
query = """
    MERGE (g:Game {name: "Baldur's Gate 3"})
    WITH g, ["PlayStation 5", "Mac OS", "Windows", "Xbox Series X/S"] AS platforms,
            ["Adventure", "Role-Playing Game", "Strategy"] AS genres
    FOREACH (platform IN platforms |
        MERGE (p:Platform {name: platform})
        MERGE (g)-[:AVAILABLE_ON]->(p)
    )
    FOREACH (genre IN genres |
        MERGE (gn:Genre {name: genre})
        MERGE (g)-[:HAS_GENRE]->(gn)
    )
    MERGE (p:Publisher {name: "Larian Studios"})
    MERGE (g)-[:PUBLISHED_BY]->(p);
"""

graph.query(query)

[]

请注意graphobject 保存query方法。该方法在 Memgraph 中执行查询，并且也被MemgraphQAChain以查询数据库。

刷新图形架构

由于新数据是在 Memgraph 中创建的，因此需要刷新 schema。生成的 schema 将被MemgraphQAChain指示 LLM 更好地生成 Cypher 查询。

graph.refresh_schema()

要熟悉数据并验证更新的图形架构，您可以使用以下语句打印它：

print(graph.get_schema)

Node labels and properties (name and type) are:
- labels: (:Platform)
  properties:
    - name: string
- labels: (:Genre)
  properties:
    - name: string
- labels: (:Game)
  properties:
    - name: string
- labels: (:Publisher)
  properties:
    - name: string

Nodes are connected with the following relationships:
(:Game)-[:HAS_GENRE]->(:Genre)
(:Game)-[:PUBLISHED_BY]->(:Publisher)
(:Game)-[:AVAILABLE_ON]->(:Platform)

查询数据库

要与 OpenAI API 交互，您必须将 API 密钥配置为环境变量。这可确保对您的请求进行适当的授权。您可以在此处找到有关获取 API 密钥的更多信息。要配置 API 密钥，您可以使用 Python os 包：

os.environ["OPENAI_API_KEY"] = "your-key-here"

如果您在 Jupyter 笔记本中运行代码，请运行上述代码段。

接下来，创建MemgraphQAChain，它将用于基于您的图形数据的问答过程。这temperature parameter设置为零，以确保答案的可预测性和一致性。您可以设置verbose参数设置为True以接收有关查询生成的更详细信息。

chain = MemgraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    graph=graph,
    model_name="gpt-4-turbo",
    allow_dangerous_requests=True,
)

现在您可以开始提问了！

response = chain.invoke("Which platforms is Baldur's Gate 3 available on?")
print(response["result"])

MATCH (:Game{name: "Baldur's Gate 3"})-[:AVAILABLE_ON]->(platform:Platform)
RETURN platform.name
Baldur's Gate 3 is available on PlayStation 5, Mac OS, Windows, and Xbox Series X/S.

response = chain.invoke("Is Baldur's Gate 3 available on Windows?")
print(response["result"])

MATCH (:Game{name: "Baldur's Gate 3"})-[:AVAILABLE_ON]->(:Platform{name: "Windows"})
RETURN "Yes"
Yes, Baldur's Gate 3 is available on Windows.

链修饰符

要修改链的行为并获取更多上下文或其他信息，您可以修改链的参数。

返回直接查询结果

这return_directmodifier 指定是返回已执行的 Cypher 查询的直接结果，还是返回已处理的自然语言响应的直接结果。

# Return the result of querying the graph directly
chain = MemgraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    graph=graph,
    return_direct=True,
    allow_dangerous_requests=True,
    model_name="gpt-4-turbo",
)

response = chain.invoke("Which studio published Baldur's Gate 3?")
print(response["result"])

MATCH (g:Game {name: "Baldur's Gate 3"})-[:PUBLISHED_BY]->(p:Publisher)
RETURN p.name
[{'p.name': 'Larian Studios'}]

返回查询中间步骤

这return_intermediate_stepsChain Modifier 除了初始查询结果之外，还通过包括查询的中间步骤来增强返回的响应。

# Return all the intermediate steps of query execution
chain = MemgraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    graph=graph,
    allow_dangerous_requests=True,
    return_intermediate_steps=True,
    model_name="gpt-4-turbo",
)

response = chain.invoke("Is Baldur's Gate 3 an Adventure game?")
print(f"Intermediate steps: {response['intermediate_steps']}")
print(f"Final response: {response['result']}")

MATCH (:Game {name: "Baldur's Gate 3"})-[:HAS_GENRE]->(:Genre {name: "Adventure"})
RETURN "Yes"
Intermediate steps: [{'query': 'MATCH (:Game {name: "Baldur\'s Gate 3"})-[:HAS_GENRE]->(:Genre {name: "Adventure"})\nRETURN "Yes"'}, {'context': [{'"Yes"': 'Yes'}]}]
Final response: Yes.

限制查询结果的数量

这top_k当您想要限制查询结果的最大数量时，可以使用 modifier。

# Limit the maximum number of results returned by query
chain = MemgraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    graph=graph,
    top_k=2,
    allow_dangerous_requests=True,
    model_name="gpt-4-turbo",
)

response = chain.invoke("What genres are associated with Baldur's Gate 3?")
print(response["result"])

MATCH (:Game {name: "Baldur's Gate 3"})-[:HAS_GENRE]->(g:Genre)
RETURN g.name;
Adventure, Role-Playing Game

高级查询

随着解决方案复杂性的增加，您可能会遇到需要小心处理的不同使用案例。确保应用程序的可伸缩性对于保持顺畅的用户流而没有任何障碍至关重要。

让我们再次实例化我们的链，并尝试提出一些用户可能会问的问题。

chain = MemgraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    graph=graph,
    model_name="gpt-4-turbo",
    allow_dangerous_requests=True,
)

response = chain.invoke("Is Baldur's Gate 3 available on PS5?")
print(response["result"])

MATCH (:Game{name: "Baldur's Gate 3"})-[:AVAILABLE_ON]->(:Platform{name: "PS5"})
RETURN "Yes"
I don't know the answer.

生成的 Cypher 查询看起来不错，但我们没有收到任何响应信息。这说明了使用 LLM 时的一个常见挑战 - 用户查询的措辞方式与数据的存储方式不一致。在这种情况下，用户感知与实际数据存储之间的差异可能会导致不匹配。Prompt refinement，即磨练模型的提示以更好地掌握这些区别的过程，是解决这个问题的有效解决方案。通过及时优化，模型在生成精确和相关查询方面的熟练程度更高，从而成功检索所需数据。

为了解决这个问题，我们可以调整 QA 链的初始 Cypher 提示符。这涉及向 LLM 添加有关用户如何引用特定平台的指导，例如我们示例中的 PS5。我们使用 LangChain PromptTemplate 实现此目的，并创建修改后的初始提示。然后，这个修改后的提示将作为参数提供给我们的 refinedMemgraphQAChain实例。

MEMGRAPH_GENERATION_TEMPLATE = """Your task is to directly translate natural language inquiry into precise and executable Cypher query for Memgraph database. 
You will utilize a provided database schema to understand the structure, nodes and relationships within the Memgraph database.
Instructions: 
- Use provided node and relationship labels and property names from the
schema which describes the database's structure. Upon receiving a user
question, synthesize the schema to craft a precise Cypher query that
directly corresponds to the user's intent. 
- Generate valid executable Cypher queries on top of Memgraph database. 
Any explanation, context, or additional information that is not a part 
of the Cypher query syntax should be omitted entirely. 
- Use Memgraph MAGE procedures instead of Neo4j APOC procedures. 
- Do not include any explanations or apologies in your responses. 
- Do not include any text except the generated Cypher statement.
- For queries that ask for information or functionalities outside the direct
generation of Cypher queries, use the Cypher query format to communicate
limitations or capabilities. For example: RETURN "I am designed to generate
Cypher queries based on the provided schema only."
Schema: 
{schema}

With all the above information and instructions, generate Cypher query for the
user question. 
If the user asks about PS5, Play Station 5 or PS 5, that is the platform called PlayStation 5.

The question is:
{question}"""

MEMGRAPH_GENERATION_PROMPT = PromptTemplate(
    input_variables=["schema", "question"], template=MEMGRAPH_GENERATION_TEMPLATE
)

chain = MemgraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    cypher_prompt=MEMGRAPH_GENERATION_PROMPT,
    graph=graph,
    model_name="gpt-4-turbo",
    allow_dangerous_requests=True,
)

response = chain.invoke("Is Baldur's Gate 3 available on PS5?")
print(response["result"])

MATCH (:Game{name: "Baldur's Gate 3"})-[:AVAILABLE_ON]->(:Platform{name: "PlayStation 5"})
RETURN "Yes"
Yes, Baldur's Gate 3 is available on PS5.

现在，通过修订后的初始 Cypher 提示（包括平台命名指南），我们获得了准确且相关的结果，这些结果与用户查询更加一致。

这种方法可以进一步改进您的 QA 链。您可以毫不费力地将额外的提示优化数据集成到您的链中，从而增强应用程序的整体用户体验。

构建知识图谱

将非结构化数据转换为结构化数据并非易事。本指南将展示如何利用 LLM 来帮助我们，以及如何在 Memgraph 中构建知识图谱。创建知识图谱后，您可以将其用于 GraphRAG 应用程序。

从文本构建知识图谱的步骤是：

从文本中提取结构化信息：LLM 用于以节点和关系的形式从文本中提取结构化图信息。
Stored into Memgraph：将提取的结构化图信息存储到 Memgraph 中。

从文本中提取结构化信息

除了 setup 部分中的所有导入外，导入LLMGraphTransformer和Document将用于从文本中提取结构化信息。

from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer

API 参考：文档 | LLMGraphTransformer （LLMGraph变压器）

以下是有关查尔斯·达尔文（来源）的示例文本，将根据该文本构建知识图谱。

text = """
    Charles Robert Darwin was an English naturalist, geologist, and biologist,
    widely known for his contributions to evolutionary biology. His proposition that
    all species of life have descended from a common ancestor is now generally
    accepted and considered a fundamental scientific concept. In a joint
    publication with Alfred Russel Wallace, he introduced his scientific theory that
    this branching pattern of evolution resulted from a process he called natural
    selection, in which the struggle for existence has a similar effect to the
    artificial selection involved in selective breeding. Darwin has been
    described as one of the most influential figures in human history and was
    honoured by burial in Westminster Abbey.
"""

下一步是初始化LLMGraphTransformer并从所需的 LLM 中将文档转换为图形结构。

llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo")
llm_transformer = LLMGraphTransformer(llm=llm)
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)

在后台，LLM 从文本中提取重要实体，并将其作为节点和关系列表返回。这是它的样子：

print(graph_documents)

[GraphDocument(nodes=[Node(id='Charles Robert Darwin', type='Person', properties={}), Node(id='English', type='Nationality', properties={}), Node(id='Naturalist', type='Profession', properties={}), Node(id='Geologist', type='Profession', properties={}), Node(id='Biologist', type='Profession', properties={}), Node(id='Evolutionary Biology', type='Field', properties={}), Node(id='Common Ancestor', type='Concept', properties={}), Node(id='Scientific Concept', type='Concept', properties={}), Node(id='Alfred Russel Wallace', type='Person', properties={}), Node(id='Natural Selection', type='Concept', properties={}), Node(id='Selective Breeding', type='Concept', properties={}), Node(id='Westminster Abbey', type='Location', properties={})], relationships=[Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='English', type='Nationality', properties={}), type='NATIONALITY', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Naturalist', type='Profession', properties={}), type='PROFESSION', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Geologist', type='Profession', properties={}), type='PROFESSION', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Biologist', type='Profession', properties={}), type='PROFESSION', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Evolutionary Biology', type='Field', properties={}), type='CONTRIBUTION', properties={}), Relationship(source=Node(id='Common Ancestor', type='Concept', properties={}), target=Node(id='Scientific Concept', type='Concept', properties={}), type='BASIS', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Alfred Russel Wallace', type='Person', properties={}), type='COLLABORATION', properties={}), Relationship(source=Node(id='Natural Selection', type='Concept', properties={}), target=Node(id='Selective Breeding', type='Concept', properties={}), type='COMPARISON', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Westminster Abbey', type='Location', properties={}), type='BURIAL', properties={})], source=Document(metadata={}, page_content='\n    Charles Robert Darwin was an English naturalist, geologist, and biologist,\n    widely known for his contributions to evolutionary biology. His proposition that\n    all species of life have descended from a common ancestor is now generally\n    accepted and considered a fundamental scientific concept. In a joint\n    publication with Alfred Russel Wallace, he introduced his scientific theory that\n    this branching pattern of evolution resulted from a process he called natural\n    selection, in which the struggle for existence has a similar effect to the\n    artificial selection involved in selective breeding. Darwin has been\n    described as one of the most influential figures in human history and was\n    honoured by burial in Westminster Abbey.\n'))]

存储到 Memgraph 中

以GraphDocument，即节点和关系，您可以使用add_graph_documents方法将其导入 Memgraph 中。该方法将graph_documents转换为需要在 Memgraph 中执行的适当 Cypher 查询。完成后，知识图谱将存储在 Memgraph 中。

# Empty the database
graph.query("STORAGE MODE IN_MEMORY_ANALYTICAL")
graph.query("DROP GRAPH")
graph.query("STORAGE MODE IN_MEMORY_TRANSACTIONAL")

# Create KG
graph.add_graph_documents(graph_documents)

这是图表在 Memgraph Lab 中的样子（检查localhost:3000):

内存图-KG

如果您尝试了此作并获得了不同的图表，则这是预期行为。图形构建过程是不确定的，因为 LLM 用于从非确定性的非结构化数据生成节点和关系。

其他选项

此外，您还可以根据需要灵活地定义特定类型的节点和关系以进行提取。

llm_transformer_filtered = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=["Person", "Nationality", "Concept"],
    allowed_relationships=["NATIONALITY", "INVOLVED_IN", "COLLABORATES_WITH"],
)
graph_documents_filtered = llm_transformer_filtered.convert_to_graph_documents(
    documents
)

print(f"Nodes:{graph_documents_filtered[0].nodes}")
print(f"Relationships:{graph_documents_filtered[0].relationships}")

Nodes:[Node(id='Charles Robert Darwin', type='Person', properties={}), Node(id='English', type='Nationality', properties={}), Node(id='Evolutionary Biology', type='Concept', properties={}), Node(id='Natural Selection', type='Concept', properties={}), Node(id='Alfred Russel Wallace', type='Person', properties={})]
Relationships:[Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='English', type='Nationality', properties={}), type='NATIONALITY', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Evolutionary Biology', type='Concept', properties={}), type='INVOLVED_IN', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Natural Selection', type='Concept', properties={}), type='INVOLVED_IN', properties={}), Relationship(source=Node(id='Charles Robert Darwin', type='Person', properties={}), target=Node(id='Alfred Russel Wallace', type='Person', properties={}), type='COLLABORATES_WITH', properties={})]

在这种情况下，图表是这样的：

内存图-kg-2

您的图表还可以具有__Entity__标签，这些标签将被索引以加快检索速度。

# Drop graph
graph.query("STORAGE MODE IN_MEMORY_ANALYTICAL")
graph.query("DROP GRAPH")
graph.query("STORAGE MODE IN_MEMORY_TRANSACTIONAL")

# Store to Memgraph with Entity label
graph.add_graph_documents(graph_documents, baseEntityLabel=True)

下图如下所示：

内存机-KG-3

还有一个选项可以包含在图表中获取的信息的来源。为此，请将include_source自True然后存储源文档，并使用MENTIONS关系。

# Drop graph
graph.query("STORAGE MODE IN_MEMORY_ANALYTICAL")
graph.query("DROP GRAPH")
graph.query("STORAGE MODE IN_MEMORY_TRANSACTIONAL")

# Store to Memgraph with source included
graph.add_graph_documents(graph_documents, include_source=True)

构造的图形将如下所示：

内存机-KG-4

请注意源内容的存储方式以及id属性，因为文档没有任何id. 您可以将两者结合起来__Entity__标签和文档源。不过，请注意，两者都会占用内存，尤其是由于内容字符串较长而包含的 source。

最后，您可以查询知识图谱，如前面的部分所述：

chain = MemgraphQAChain.from_llm(
    ChatOpenAI(temperature=0),
    graph=graph,
    model_name="gpt-4-turbo",
    allow_dangerous_requests=True,
)
print(chain.invoke("Who Charles Robert Darwin collaborated with?")["result"])

MATCH (:Person {id: "Charles Robert Darwin"})-[:COLLABORATION]->(collaborator)
RETURN collaborator;
Alfred Russel Wallace