TiDB

TiDB Cloud, is a comprehensive Database-as-a-Service (DBaaS) solution, that provides dedicated and serverless options. TiDB Serverless is now integrating a built-in vector search into the MySQL landscape. With this enhancement, you can seamlessly develop AI applications using TiDB Serverless without the need for a new database or additional technical stacks. Be among the first to experience it by joining the waitlist for the private beta at https://tidb.cloud/ai.

本笔记本介绍如何使用 TiDBLoader 在 LangChain 中从 TiDB 加载数据。

先决条件

在使用 TiDBLoader 之前，我们将安装以下依赖项：

%pip install --upgrade --quiet langchain

接下来，我们将配置与 TiDB 的连接。在此笔记本中，我们将遵循 TiDB Cloud 提供的标准连接方法，以建立安全且高效的数据库连接。

import getpass

# copy from tidb cloud console，replace it with your own
tidb_connection_string_template = "mysql+pymysql://<USER>:<PASSWORD>@<HOST>:4000/<DB>?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true"
tidb_password = getpass.getpass("Input your TiDB password:")
tidb_connection_string = tidb_connection_string_template.replace(
    "<PASSWORD>", tidb_password
)

从 TiDB 加载数据

以下是可用于自定义 TiDBLoader 行为的一些关键参数说明：

query (str)：这是要在 TiDB 数据库上执行的 SQL 查询。该查询应选择您想要加载到 Document 对象中的数据。例如，您可以使用类似 "SELECT * FROM my_table" 的查询从 my_table 中获取所有数据。
page_content_columns (Optional[List[str]]): 指定其值应包含在每个 Document 对象的 page_content 中的列名列表。如果设置为 None（默认值），则查询返回的所有列都将包含在 page_content 中。这允许您根据数据的特定列来定制每个文档的内容。
metadata_columns (Optional[List[str]]): 指定应包含在每个 Document 对象的 metadata 中的列名列表。默认情况下，此列表为空，意味着除非明确指定，否则不会包含任何元数据。这对于包含有关每个文档的附加信息非常有用，这些信息不构成主要内容，但对于处理或分析仍然有价值。

from sqlalchemy import Column, Integer, MetaData, String, Table, create_engine

# Connect to the database
engine = create_engine(tidb_connection_string)
metadata = MetaData()
table_name = "test_tidb_loader"

# Create a table
test_table = Table(
    table_name,
    metadata,
    Column("id", Integer, primary_key=True),
    Column("name", String(255)),
    Column("description", String(255)),
)
metadata.create_all(engine)


with engine.connect() as connection:
    transaction = connection.begin()
    try:
        connection.execute(
            test_table.insert(),
            [
                {"name": "Item 1", "description": "Description of Item 1"},
                {"name": "Item 2", "description": "Description of Item 2"},
                {"name": "Item 3", "description": "Description of Item 3"},
            ],
        )
        transaction.commit()
    except:
        transaction.rollback()
        raise

from langchain_community.document_loaders import TiDBLoader

# Setup TiDBLoader to retrieve data
loader = TiDBLoader(
    connection_string=tidb_connection_string,
    query=f"SELECT * FROM {table_name};",
    page_content_columns=["name", "description"],
    metadata_columns=["id"],
)

# Load data
documents = loader.load()

# Display the loaded documents
for doc in documents:
    print("-" * 30)
    print(f"content: {doc.page_content}\nmetada: {doc.metadata}")

API 参考：TiDBLoader

------------------------------
content: name: Item 1
description: Description of Item 1
metada: {'id': 1}
------------------------------
content: name: Item 2
description: Description of Item 2
metada: {'id': 2}
------------------------------
content: name: Item 3
description: Description of Item 3
metada: {'id': 3}

test_table.drop(bind=engine)

文档加载器概念指南
文档加载器操操作指南

先决条件​

从 TiDB 加载数据​

相关​

先决条件

从 TiDB 加载数据

相关