Skip to main content
Open In Colab在 GitHub 上打开

克拉里法伊

Clarifai 是一个 AI 平台,提供完整的 AI 生命周期,包括数据探索、数据标注、模型训练、评估和推理。上传输入后,Clarifai 应用程序可以用作矢量数据库。

此笔记本展示了如何使用与Clarifaivector 数据库。显示了一些示例来演示文本语义搜索功能。Clarifai 还支持使用图像、视频帧和本地化搜索(请参阅排名)和属性搜索(请参阅筛选器)进行语义搜索。

要使用 Clarifai,您必须拥有帐户和个人访问令牌 (PAT) 密钥。在此处查看以获取或创建 PAT。

依赖

# Install required dependencies
%pip install --upgrade --quiet clarifai langchain-community

进口

在这里,我们将设置 personal access token。您可以在平台上的设置/安全下找到您的 PAT。

# Please login and get your API key from  https://clarifai.com/settings/security
from getpass import getpass

CLARIFAI_PAT = getpass()
 ········
# Import the required modules
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Clarifai
from langchain_text_splitters import CharacterTextSplitter

设置

设置将上传文本数据的用户 ID 和 应用 ID。注意:在创建该应用程序时,请选择适当的基本工作流程来索引您的文本文档,例如 Language-Understanding 工作流程。

您必须先在 Clarifai 上创建一个帐户,然后创建一个应用程序。

USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 2

来自文本

从文本列表创建 Clarifai vectorstore。本部分会将每个文本及其各自的元数据上传到 Clarifai 应用程序。然后,可以使用 Clarifai 应用程序进行语义搜索以查找相关文本。

texts = [
"I really enjoy spending time with you",
"I hate spending time with my dog",
"I want to go for a run",
"I went to the movies yesterday",
"I love playing soccer with my friends",
]

metadatas = [
{"id": i, "text": text, "source": "book 1", "category": ["books", "modern"]}
for i, text in enumerate(texts)
]

或者,您可以选择为输入提供自定义输入 ID。

idlist = ["text1", "text2", "text3", "text4", "text5"]
metadatas = [
{"id": idlist[i], "text": text, "source": "book 1", "category": ["books", "modern"]}
for i, text in enumerate(texts)
]
# There is an option to initialize clarifai vector store with pat as argument!
clarifai_vector_db = Clarifai(
user_id=USER_ID,
app_id=APP_ID,
number_of_docs=NUMBER_OF_DOCS,
)

将数据上传到 clarifai 应用程序。

# upload with metadata and custom input ids.
response = clarifai_vector_db.add_texts(texts=texts, ids=idlist, metadatas=metadatas)

# upload without metadata (Not recommended)- Since you will not be able to perform Search operation with respect to metadata.
# custom input_id (optional)
response = clarifai_vector_db.add_texts(texts=texts)

您可以创建一个 clarifai 向量数据库存储,并直接将所有输入提取到您的应用程序中。

clarifai_vector_db = Clarifai.from_texts(
user_id=USER_ID,
app_id=APP_ID,
texts=texts,
metadatas=metadatas,
)

使用相似性搜索功能搜索相似文本。

docs = clarifai_vector_db.similarity_search("I would like to see you")
docs
[Document(page_content='I really enjoy spending time with you', metadata={'text': 'I really enjoy spending time with you', 'id': 'text1', 'source': 'book 1', 'category': ['books', 'modern']})]

此外,您还可以按元数据筛选搜索结果。

# There is lots powerful filtering you can do within an app by leveraging metadata filters.
# This one will limit the similarity query to only the texts that have key of "source" matching value of "book 1"
book1_similar_docs = clarifai_vector_db.similarity_search(
"I would love to see you", filter={"source": "book 1"}
)

# you can also use lists in the input's metadata and then select things that match an item in the list. This is useful for categories like below:
book_category_similar_docs = clarifai_vector_db.similarity_search(
"I would love to see you", filter={"category": ["books"]}
)

从文档

从 Documents 列表创建 Clarifai vectorstore。本部分会将每个文档及其各自的元数据上传到 Clarifai 应用程序。然后,可以使用 Clarifai 应用程序进行语义搜索以查找相关文档。

loader = TextLoader("your_local_file_path.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 4

创建一个 clarifai 向量 DB 类,并将所有文档提取到 clarifai App 中。

clarifai_vector_db = Clarifai.from_documents(
user_id=USER_ID,
app_id=APP_ID,
documents=docs,
number_of_docs=NUMBER_OF_DOCS,
)
docs = clarifai_vector_db.similarity_search("Texts related to population")
docs

从现有应用程序

在 Clarifai 中,我们有很棒的工具,可以通过 API 或 UI 将数据添加到应用程序(本质上是项目)。大多数用户在与 LangChain 交互之前已经完成了这些作,因此此示例将使用现有应用程序中的数据来执行搜索。查看我们的 API 文档UI 文档。然后,可以使用 Clarifai 应用程序进行语义搜索以查找相关文档。

USER_ID = "USERNAME_ID"
APP_ID = "APPLICATION_ID"
NUMBER_OF_DOCS = 4
clarifai_vector_db = Clarifai(
user_id=USER_ID,
app_id=APP_ID,
number_of_docs=NUMBER_OF_DOCS,
)
docs = clarifai_vector_db.similarity_search(
"Texts related to ammuniction and president wilson"
)
docs[0].page_content
"President Wilson, generally acclaimed as the leader of the world's democracies,\nphrased for civilization the arguments against autocracy in the great peace conference\nafter the war. The President headed the American delegation to that conclave of world\nre-construction. With him as delegates to the conference were Robert Lansing, Secretary\nof State; Henry White, former Ambassador to France and Italy; Edward M. House and\nGeneral Tasker H. Bliss.\nRepresenting American Labor at the International Labor conference held in Paris\nsimultaneously with the Peace Conference were Samuel Gompers, president of the\nAmerican Federation of Labor; William Green, secretary-treasurer of the United Mine\nWorkers of America; John R. Alpine, president of the Plumbers' Union; James Duncan,\npresident of the International Association of Granite Cutters; Frank Duffy, president of\nthe United Brotherhood of Carpenters and Joiners, and Frank Morrison, secretary of the\nAmerican Federation of Labor.\nEstimating the share of each Allied nation in the great victory, mankind will\nconclude that the heaviest cost in proportion to prewar population and treasure was paid\nby the nations that first felt the shock of war, Belgium, Serbia, Poland and France. All\nfour were the battle-grounds of huge armies, oscillating in a bloody frenzy over once\nfertile fields and once prosperous towns.\nBelgium, with a population of 8,000,000, had a casualty list of more than 350,000;\nFrance, with its casualties of 4,000,000 out of a population (including its colonies) of\n90,000,000, is really the martyr nation of the world. Her gallant poilus showed the world\nhow cheerfully men may die in defense of home and liberty. Huge Russia, including\nhapless Poland, had a casualty list of 7,000,000 out of its entire population of\n180,000,000. The United States out of a population of 110,000,000 had a casualty list of\n236,117 for nineteen months of war; of these 53,169 were killed or died of disease;\n179,625 were wounded; and 3,323 prisoners or missing."