Neo4J
Neo4j 是由
Neo4j, Inc开发的图数据库管理系统。
数据元素
Neo4j存储的是节点、连接它们的边,以及节点和边的属性。其开发者将其描述为一个符合 ACID 的事务型数据库,具有原生的图存储与处理能力。Neo4j以非开源的“社区版”形式提供,采用修改版的 GNU 通用公共许可证授权,而在线备份和高可用性扩展功能则采用闭源的商业许可证授权。Neo 还在闭源的商业条款下授权Neo4j使用这些扩展功能。
此笔记本展示了如何使用大语言模型(LLM)为图数据库提供自然语言接口,您可以使用
Cypher查询语言对其进行查询。
Cypher 是一种声明式的图查询语言,可在属性图中实现富有表现力且高效的数据查询。
设置
您需要有一个正在运行的 Neo4j 实例。一种选择是在 Neo4j 的 Aura 云服务中创建一个免费的 Neo4j 数据库实例。您也可以使用Neo4j Desktop 应用程序在本地运行数据库,或通过运行 Docker 容器来启动。 您可以通过执行以下脚本来运行本地 Docker 容器:
docker run \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-d \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_PLUGINS=\[\"apoc\"\] \
neo4j:latest
如果您使用的是 Docker 容器,则需要等待几秒钟,以便数据库启动。
from langchain_neo4j import GraphCypherQAChain, Neo4jGraph
from langchain_openai import ChatOpenAI
graph = Neo4jGraph(url="bolt://localhost:7687", username="neo4j", password="password")
本指南默认使用 OpenAI 模型。
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
填充数据库
假设您的数据库为空,您可以使用Cypher查询语言来填充它。以下Cypher语句是幂等的,这意味着无论您运行一次还是多次,数据库中的信息都将保持不变。
graph.query(
"""
MERGE (m:Movie {name:"Top Gun", runtime: 120})
WITH m
UNWIND ["Tom Cruise", "Val Kilmer", "Anthony Edwards", "Meg Ryan"] AS actor
MERGE (a:Actor {name:actor})
MERGE (a)-[:ACTED_IN]->(m)
"""
)
[]
刷新图谱模式信息
如果数据库的模式发生变化,您可以刷新生成Cypher语句所需的模式信息。
graph.refresh_schema()
print(graph.schema)
Node properties:
Movie {runtime: INTEGER, name: STRING}
Actor {name: STRING}
Relationship properties:
The relationships:
(:Actor)-[:ACTED_IN]->(:Movie)
增强的模式信息
选择增强型模式版本后,系统将自动扫描数据库中的示例值,并计算一些分布统计信息。例如,如果某个节点属性的唯一值少于10个,我们将在模式中返回所有可能的值;否则,每个节点和关系属性仅返回一个示例值。
enhanced_graph = Neo4jGraph(
url="bolt://localhost:7687",
username="neo4j",
password="password",
enhanced_schema=True,
)
print(enhanced_graph.schema)
Node properties:
- **Movie**
- `runtime`: INTEGER Min: 120, Max: 120
- `name`: STRING Available options: ['Top Gun']
- **Actor**
- `name`: STRING Available options: ['Tom Cruise', 'Val Kilmer', 'Anthony Edwards', 'Meg Ryan']
Relationship properties:
The relationships:
(:Actor)-[:ACTED_IN]->(:Movie)
查询图谱
我们现在可以使用图谱Cypher问答链来向图谱提问
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0), graph=graph, verbose=True, allow_dangerous_requests=True
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': 'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'}
限制结果数量
您可以通过使用 top_k 参数来限制 Cypher QA 链返回的结果数量。 默认值为 10。
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
top_k=2,
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': 'Tom Cruise, Val Kilmer played in Top Gun.'}
返回中间结果
您可以使用 return_intermediate_steps 参数从Cypher QA链返回中间步骤
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
return_intermediate_steps=True,
allow_dangerous_requests=True,
)
result = chain.invoke({"query": "Who played in Top Gun?"})
print(f"Intermediate steps: {result['intermediate_steps']}")
print(f"Final answer: {result['result']}")
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
Intermediate steps: [{'query': "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)\nWHERE m.name = 'Top Gun'\nRETURN a.name"}, {'context': [{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}]}]
Final answer: Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.
返回直接结果
您可以使用 return_direct 参数从Cypher QA链直接返回结果
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
return_direct=True,
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': [{'a.name': 'Tom Cruise'},
{'a.name': 'Val Kilmer'},
{'a.name': 'Anthony Edwards'},
{'a.name': 'Meg Ryan'}]}
在Cypher生成提示中添加示例
您可以定义希望LLM为特定问题生成的Cypher语句
from langchain_core.prompts.prompt import PromptTemplate
CYPHER_GENERATION_TEMPLATE = """Task:Generate Cypher statement to query a graph database.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
Schema:
{schema}
Note: Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything else than for you to construct a Cypher statement.
Do not include any text except the generated Cypher statement.
Examples: Here are a few examples of generated Cypher statements for particular questions:
# How many people played in Top Gun?
MATCH (m:Movie {{name:"Top Gun"}})<-[:ACTED_IN]-()
RETURN count(*) AS numberOfActors
The question is:
{question}"""
CYPHER_GENERATION_PROMPT = PromptTemplate(
input_variables=["schema", "question"], template=CYPHER_GENERATION_TEMPLATE
)
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0),
graph=graph,
verbose=True,
cypher_prompt=CYPHER_GENERATION_PROMPT,
allow_dangerous_requests=True,
)
chain.invoke({"query": "How many people played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (m:Movie {name:"Top Gun"})<-[:ACTED_IN]-()
RETURN count(*) AS numberOfActors[0m
Full Context:
[32;1m[1;3m[{'numberOfActors': 4}][0m
[1m> Finished chain.[0m
{'query': 'How many people played in Top Gun?',
'result': 'There were 4 actors in Top Gun.'}
为 Cypher 和答案生成使用独立的 LLM
您可以使用 cypher_llm 和 qa_llm 参数来定义不同的大语言模型(llms)
chain = GraphCypherQAChain.from_llm(
graph=graph,
cypher_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
qa_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
verbose=True,
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': 'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'}
忽略指定的节点和关系类型
您可以使用 include_types 或 exclude_types 在生成 Cypher 语句时忽略图结构的某些部分。
chain = GraphCypherQAChain.from_llm(
graph=graph,
cypher_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
qa_llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-16k"),
verbose=True,
exclude_types=["Movie"],
allow_dangerous_requests=True,
)
# Inspect graph schema
print(chain.graph_schema)
Node properties are the following:
Actor {name: STRING}
Relationship properties are the following:
The relationships are the following:
验证生成的Cypher语句
您可以使用 validate_cypher 参数来验证和纠正生成的 Cypher 语句中的关系方向
chain = GraphCypherQAChain.from_llm(
llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
graph=graph,
verbose=True,
validate_cypher=True,
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': 'Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan played in Top Gun.'}
将数据库结果中的上下文作为工具/函数输出提供
您可以使用 use_function_response 参数将数据库结果中的上下文作为工具/函数输出传递给大语言模型(LLM)。此方法通过让LLM更紧密地遵循所提供的上下文,提高回答的准确性和相关性。 您需要使用支持原生函数调用的LLM才能使用此功能。
chain = GraphCypherQAChain.from_llm(
llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
graph=graph,
verbose=True,
use_function_response=True,
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': 'The main actors in Top Gun are Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan.'}
使用函数响应功能时,可以通过提供 function_response_system 来提供自定义系统消息,以指示模型如何生成答案。
请注意,当使用 use_function_response 时,qa_prompt 将不起任何作用
chain = GraphCypherQAChain.from_llm(
llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
graph=graph,
verbose=True,
use_function_response=True,
function_response_system="Respond as a pirate!",
allow_dangerous_requests=True,
)
chain.invoke({"query": "Who played in Top Gun?"})
[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (a:Actor)-[:ACTED_IN]->(m:Movie)
WHERE m.name = 'Top Gun'
RETURN a.name[0m
Full Context:
[32;1m[1;3m[{'a.name': 'Tom Cruise'}, {'a.name': 'Val Kilmer'}, {'a.name': 'Anthony Edwards'}, {'a.name': 'Meg Ryan'}][0m
[1m> Finished chain.[0m
{'query': 'Who played in Top Gun?',
'result': "Arrr matey! In the film Top Gun, ye be seein' Tom Cruise, Val Kilmer, Anthony Edwards, and Meg Ryan sailin' the high seas of the sky! Aye, they be a fine crew of actors, they be!"}