Skip to main content
Open In ColabOpen on GitHub

ChatDatabricks

Databricks 湖库平台在一个平台上统一了数据、分析和AI。

本笔记本简要介绍了如何开始使用 Databricks 聊天模型。有关所有 ChatDatabricks 功能和配置的详细文档,请访问 API 参考

概览

ChatDatabricks 类包装了一个托管在 Databricks Model Serving 上的聊天模型端点。此示例笔记本展示了如何包装你的服务端点并在你的 LangChain 应用程序中将其作为聊天模型使用。

集成细节

Class本地序列化Package downloadsPackage 最新版本
ChatDatabricksdatabricks-langchainbetaPyPI - DownloadsPyPI - Version

模型特性

工具调用结构化输出JSON 模式图像输入音频输入视频输入Token级流式传输原生异步Token 使用对数概率

Supported Methods

ChatDatabricks 支持所有 ChatModel 的方法,包括异步API。

Endpoint 要求

The serving endpoint ChatDatabricks 包裹必须具有 OpenAI 兼容的聊天输入/输出格式(参考)。只要输入格式兼容,ChatDatabricks 可以用于托管在 Databricks Model Serving 的任何端点类型中:

  1. 基础模型 - 汇集了诸如DRBX、Llama3、Mixtral-8x7B等最先进的基础模型。这些端点可以在您的Databricks工作区中直接使用,无需任何设置。
  2. 自定义模型 - 您可以通过MLflow将自定义模型部署到服务端点,并选择诸如LangChain、Pytorch、Transformers等框架。
  3. 外部模型 - Databricks端点可以作为代理托管在Databricks之外的模型,例如私有模型服务如OpenAI GPT4。

设置

要访问Databricks模型,您需要创建一个Databricks账户、设置凭据(如果您在Databricks工作区之外,请仅在此步骤操作)并安装所需的包。

Credentials(仅如果您在Databricks之外)

如果您在Databricks中运行LangChain应用,请跳过此步骤。

否则,您需要手动设置 Databricks 工作空间主机名和个人访问令牌环境变量分别为 DATABRICKS_HOSTDATABRICKS_TOKEN。有关获取访问令牌的方法,请参阅 认证文档

import getpass
import os

os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
if "DATABRICKS_TOKEN" not in os.environ:
os.environ["DATABRICKS_TOKEN"] = getpass.getpass(
"Enter your Databricks access token: "
)
Enter your Databricks access token:  ········

安装

The LangChain Databricks 整合功能位于 databricks-langchain 包中。

%pip install -qU databricks-langchain

我们首先演示如何使用ChatDatabricks查询托管在基础模型端点上的DBRX-instruct模型。

其他类型的端点在设置端点本身方面有一些不同,但是,一旦端点准备好后,使用ChatDatabricks查询它的方式没有区别。请参见此笔记本底部的其他类型端点的示例。

Instantiation

from databricks_langchain import ChatDatabricks

chat_model = ChatDatabricks(
endpoint="databricks-dbrx-instruct",
temperature=0.1,
max_tokens=256,
# See https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.databricks.ChatDatabricks.html for other supported parameters
)

Invocation

chat_model.invoke("What is MLflow?")
AIMessage(content='MLflow is an open-source platform for managing end-to-end machine learning workflows. It was introduced by Databricks in 2018. MLflow provides tools for tracking experiments, packaging and sharing code, and deploying models. It is designed to work with any machine learning library and can be used in a variety of environments, including local machines, virtual machines, and cloud-based clusters. MLflow aims to streamline the machine learning development lifecycle, making it easier for data scientists and engineers to collaborate and deploy models into production.', response_metadata={'prompt_tokens': 229, 'completion_tokens': 104, 'total_tokens': 333}, id='run-d3fb4d06-3e10-4471-83c9-c282cc62b74d-0')
# You can also pass a list of messages
messages = [
("system", "You are a chatbot that can answer questions about Databricks."),
("user", "What is Databricks Model Serving?"),
]
chat_model.invoke(messages)
AIMessage(content='Databricks Model Serving is a feature of the Databricks platform that allows data scientists and engineers to easily deploy machine learning models into production. With Model Serving, you can host, manage, and serve machine learning models as APIs, making it easy to integrate them into applications and business processes. It supports a variety of popular machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn, and provides tools for monitoring and managing the performance of deployed models. Model Serving is designed to be scalable, secure, and easy to use, making it a great choice for organizations that want to quickly and efficiently deploy machine learning models into production.', response_metadata={'prompt_tokens': 35, 'completion_tokens': 130, 'total_tokens': 165}, id='run-b3feea21-223e-4105-8627-41d647d5ccab-0')

链式调用

类似于其他聊天模型,ChatDatabricks可以作为复杂链式结构的一部分使用。

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a chatbot that can answer questions about {topic}.",
),
("user", "{question}"),
]
)

chain = prompt | chat_model
chain.invoke(
{
"topic": "Databricks",
"question": "What is Unity Catalog?",
}
)
AIMessage(content="Unity Catalog is a new data catalog feature in Databricks that allows you to discover, manage, and govern all your data assets across your data landscape, including data lakes, data warehouses, and data marts. It provides a centralized repository for storing and managing metadata, data lineage, and access controls for all your data assets. Unity Catalog enables data teams to easily discover and access the data they need, while ensuring compliance with data privacy and security regulations. It is designed to work seamlessly with Databricks' Lakehouse platform, providing a unified experience for managing and analyzing all your data.", response_metadata={'prompt_tokens': 32, 'completion_tokens': 118, 'total_tokens': 150}, id='run-82d72624-f8df-4c0d-a976-919feec09a55-0')

Invocation (流式)

for chunk in chat_model.stream("How are you?"):
print(chunk.content, end="|")
I|'m| an| AI| and| don|'t| have| feelings|,| but| I|'m| here| and| ready| to| assist| you|.| How| can| I| help| you| today|?||

Async Invocation

import asyncio

country = ["Japan", "Italy", "Australia"]
futures = [chat_model.ainvoke(f"Where is the capital of {c}?") for c in country]
await asyncio.gather(*futures)

工具调用

ChatDatabricks 支持 OpenAI 兼容的工具调用 API,让您能够描述工具及其参数,并让模型返回一个包含要调用的工具及其输入参数的 JSON 对象。工具调用对于构建使用工具的链和代理非常有用,并且可以更一般地从模型中获取结构化的输出。

使用 ChatDatabricks.bind_tools,我们可以轻松地将 Pydantic 类、字典模式、LangChain 工具或甚至函数作为工具传递给模型。在幕后,这些内容会转换为与 OpenAI 兼容的工具模式,看起来像这样:

{
"name": "...",
"description": "...",
"parameters": {...} # JSONSchema
}

并且在每次模型调用时传入。

from pydantic import BaseModel, Field


class GetWeather(BaseModel):
"""Get the current weather in a given location"""

location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


class GetPopulation(BaseModel):
"""Get the current population in a given location"""

location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


llm_with_tools = chat_model.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
print(ai_msg.tool_calls)

自定义模型端点封装

前提条件:

一旦端点准备好后,使用模式与基础模型相同。

chat_model_custom = ChatDatabricks(
endpoint="YOUR_ENDPOINT_NAME",
temperature=0.1,
max_tokens=256,
)

chat_model_custom.invoke("How are you?")

外部模型封装

前置条件:创建代理端点

首先,创建一个新的Databricks服务端点,用于代理请求到目标外部模型。为代理外部模型而创建端点应该是相当快速的。

这需要在Databricks秘密管理器中注册您的OpenAI API密钥,如下所示:

# Replace `<scope>` with your scope
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY

对于如何设置 Databricks CLI 和管理密钥,请参阅 https://docs.databricks.com/en/security/secrets/secrets.html

from mlflow.deployments import get_deploy_client

client = get_deploy_client("databricks")

secret = "secrets/<scope>/openai-api-key" # replace `<scope>` with your scope
endpoint_name = "my-chat" # rename this if my-chat already exists
client.create_endpoint(
name=endpoint_name,
config={
"served_entities": [
{
"name": "my-chat",
"external_model": {
"name": "gpt-3.5-turbo",
"provider": "openai",
"task": "llm/v1/chat",
"openai_config": {
"openai_api_key": "{{" + secret + "}}",
},
},
}
],
},
)

一旦端点状态变为“就绪”,您就可以像查询其他类型的端点一样查询该端点。

chat_model_external = ChatDatabricks(
endpoint=endpoint_name,
temperature=0.1,
max_tokens=256,
)
chat_model_external.invoke("How to use Databricks?")

Databricks 上的函数调用

Databricks 函数调用与 OpenAI 兼容,仅在作为基础模型 API 的一部分进行模型服务期间可用。

Databricks函数调用介绍以了解支持的模型。

llm = ChatDatabricks(endpoint="databricks-meta-llama-3-70b-instruct")
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
},
},
}
]

# supported tool_choice values: "auto", "required", "none", function name in string format,
# or a dictionary as {"type": "function", "function": {"name": <<tool_name>>}}
model = llm.bind_tools(tools, tool_choice="auto")

messages = [{"role": "user", "content": "What is the current temperature of Chicago?"}]
print(model.invoke(messages))

Databricks Unity Catalog了解如何在链中使用UC功能。

API 参考

详细文档请参阅所有ChatDatabricks功能和配置: https://api-docs.databricks.com/python/databricks-ai-bridge/latest/databricks_langchain.html#databricks_langchain.ChatDatabricks