WikipediaRetriever
概览
Wikipedia 是一个由志愿者组成的社区——维基人通过开放协作和使用名为 MediaWiki 的基于 wiki 的编辑系统编写和维护的多语言免费在线百科全书。
Wikipedia是历史上最庞大、最受欢迎的参考作品。
这本笔记本展示了如何从wikipedia.org检索维基页面,并转换为用于下游的
集成细节
| 检索器 | 来源 | 包 |
|---|---|---|
| WikipediaRetriever | Wikipedia articles | langchain_community |
设置
要启用单个工具的自动跟踪,请设置您的LangSmithAPI密钥:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
安装
The integration lives in the langchain-community package. We also need to install the wikipedia python package itself.
%pip install -qU langchain_community wikipedia
Instantiation
现在我们可以实例化我们的检索器:
WikipediaRetriever 参数包括:
- 可选
lang:默认值=\"zh\"。用于在维基百科的特定语言部分中进行搜索 - optional
load_max_docs: 默认=100。使用它来限制下载的文档数量。全部下载100份文档需要花费一些时间,因此在实验中可以使用较小的数字。目前有一个硬性限制为300。 - optional
load_all_available_meta: default=False. By default only the most important fields downloaded:Published(date when document was published/last updated),title,Summary. If True, other fields also downloaded.
get_relevant_documents() 有一个参数,query:自由文本,用于在Wikipedia中查找文档
from langchain_community.retrievers import WikipediaRetriever
retriever = WikipediaRetriever()
API 参考:WikipediaRetriever
用法
docs = retriever.invoke("TOKYO GHOUL")
print(docs[0].page_content[:400])
Tokyo Ghoul (Japanese: 東京喰種(トーキョーグール), Hepburn: Tōkyō Gūru) is a Japanese dark fantasy manga series written and illustrated by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump from September 2011 to September 2014, with its chapters collected in 14 tankōbon volumes. The story is set in an alternate version of Tokyo where humans coexist with ghouls, beings who loo
使用在链中
像其他检索器一样,WikipediaRetriever可以通过链路被整合到LLM应用中。
我们需要一个大语言模型或聊天模型:
选择 聊天模型:
pip install -qU "langchain[openai]"
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain.chat_models import init_chat_model
llm = init_chat_model("gpt-4o-mini", model_provider="openai")
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
prompt = ChatPromptTemplate.from_template(
"""
Answer the question based only on the context provided.
Context: {context}
Question: {question}
"""
)
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke(
"Who is the main character in `Tokyo Ghoul` and does he transform into a ghoul?"
)
'The main character in Tokyo Ghoul is Ken Kaneki, who transforms into a ghoul after receiving an organ transplant from a ghoul named Rize.'
API 参考
详细文档请参阅所有WikipediaRetriever功能和配置的API参考。