Skip to main content
Open In ColabOpen on GitHub

WikipediaRetriever

概览

Wikipedia 是一个由志愿者组成的社区——维基人通过开放协作和使用名为 MediaWiki 的基于 wiki 的编辑系统编写和维护的多语言免费在线百科全书。Wikipedia是历史上最庞大、最受欢迎的参考作品。

这本笔记本展示了如何从wikipedia.org检索维基页面,并转换为用于下游的格式。

集成细节

检索器来源
WikipediaRetrieverWikipedia articleslangchain_community

设置

要启用单个工具的自动跟踪,请设置您的LangSmithAPI密钥:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

安装

The integration lives in the langchain-community package. We also need to install the wikipedia python package itself.

%pip install -qU langchain_community wikipedia

Instantiation

现在我们可以实例化我们的检索器:

WikipediaRetriever 参数包括:

  • 可选 lang:默认值=\"zh\"。用于在维基百科的特定语言部分中进行搜索
  • optional load_max_docs: 默认=100。使用它来限制下载的文档数量。全部下载100份文档需要花费一些时间,因此在实验中可以使用较小的数字。目前有一个硬性限制为300。
  • optional load_all_available_meta: default=False. By default only the most important fields downloaded: Published (date when document was published/last updated), title, Summary. If True, other fields also downloaded.

get_relevant_documents() 有一个参数,query:自由文本,用于在Wikipedia中查找文档

from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever()

用法

docs = retriever.invoke("TOKYO GHOUL")
print(docs[0].page_content[:400])
Tokyo Ghoul (Japanese: 東京喰種(トーキョーグール), Hepburn: Tōkyō Gūru) is a Japanese dark fantasy manga series written and illustrated by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump from September 2011 to September 2014, with its chapters collected in 14 tankōbon volumes. The story is set in an alternate version of Tokyo where humans coexist with ghouls, beings who loo

使用在链中

像其他检索器一样,WikipediaRetriever可以通过链路被整合到LLM应用中。

我们需要一个大语言模型或聊天模型:

pip install -qU "langchain[openai]"
import getpass
import os

if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")

from langchain.chat_models import init_chat_model

llm = init_chat_model("gpt-4o-mini", model_provider="openai")
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
"""
Answer the question based only on the context provided.
Context: {context}
Question: {question}
"""
)


def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)


chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
chain.invoke(
"Who is the main character in `Tokyo Ghoul` and does he transform into a ghoul?"
)
'The main character in Tokyo Ghoul is Ken Kaneki, who transforms into a ghoul after receiving an organ transplant from a ghoul named Rize.'

API 参考

详细文档请参阅所有WikipediaRetriever功能和配置的API参考