ChatDatabricks
DatabricksLakehouse Platform 将数据、分析和 AI 统一在一个平台上。
此笔记本提供了 Databricks 聊天模型入门的快速概述。有关所有 ChatDatabricks 功能和配置的详细文档,请转到 API 参考。
概述
ChatDatabricksclass 包装 Databricks Model Serving 上托管的聊天模型终端节点。此示例笔记本展示了如何包装您的服务终端节点并将其用作 LangChain 应用程序中的聊天模型。
集成详细信息
| 类 | 包 | 本地化 | 序列 化 | 软件包下载 | 最新包装 |
|---|---|---|---|---|---|
| ChatDatabricks | databricks-langchain | ❌ | beta |
模型特点
| 工具调用 | 结构化输出 | JSON 模式 | 图像输入 | 音频输入 | 视频输入 | 令牌级流式处理 | 本机异步 | Token 使用情况 | 日志 |
|---|---|---|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ |
支持的方法
ChatDatabricks支持所有方法ChatModel包括异步 API。
端点要求
服务终端节点ChatDatabrickswraps 必须具有与 OpenAI 兼容的聊天输入/输出格式(参考)。只要输入格式兼容,ChatDatabricks可用于 Databricks Model Serving 上托管的任何终结点类型:
- 基础模型 - 精选的最先进的基础模型列表,例如 DRBX、Llama3、Mixtral-8x7B 等。这些终结点无需任何设置即可在 Databricks 工作区中使用。
- 自定义模型 - 您还可以通过 MLflow 将自定义模型部署到服务终端节点,并使用 您选择的框架,例如 LangChain、Pytorch、Transformers 等。
- 外部模型 - Databricks 终结点可以将托管在 Databricks 外部的模型作为代理提供,例如 OpenAI GPT4 等专有模型服务。
设置
要访问 Databricks 模型,您需要创建一个 Databricks 帐户,设置凭据(仅当您位于 Databricks 工作区之外时)并安装所需的包。
凭据(仅当你在 Databricks 外部时)
如果您在 Databricks 中运行 LangChain 应用程序,则可以跳过此步骤。
否则,需要手动将 Databricks 工作区主机名和个人访问令牌设置为DATABRICKS_HOST和DATABRICKS_TOKEN环境变量。有关如何获取访问令牌的信息,请参阅身份验证文档。
import getpass
import os
os.environ["DATABRICKS_HOST"] = "https://your-workspace.cloud.databricks.com"
if "DATABRICKS_TOKEN" not in os.environ:
os.environ["DATABRICKS_TOKEN"] = getpass.getpass(
"Enter your Databricks access token: "
)
Enter your Databricks access token: ········
安装
LangChain Databricks 集成位于databricks-langchain包。
%pip install -qU databricks-langchain
我们首先演示如何查询作为基础模型终端节点托管的 DBRX-instruct 模型ChatDatabricks.
对于其他类型的终端节点,如何设置终端节点本身存在一些差异,但是,一旦终端节点准备就绪,如何使用查询它就没有区别ChatDatabricks.请参阅本笔记本的底部,了解其他类型终端节点的示例。
实例
from databricks_langchain import ChatDatabricks
chat_model = ChatDatabricks(
endpoint="databricks-dbrx-instruct",
temperature=0.1,
max_tokens=256,
# See https://python.langchain.com/api_reference/community/chat_models/langchain_community.chat_models.databricks.ChatDatabricks.html for other supported parameters
)
调用
chat_model.invoke("What is MLflow?")
AIMessage(content='MLflow is an open-source platform for managing end-to-end machine learning workflows. It was introduced by Databricks in 2018. MLflow provides tools for tracking experiments, packaging and sharing code, and deploying models. It is designed to work with any machine learning library and can be used in a variety of environments, including local machines, virtual machines, and cloud-based clusters. MLflow aims to streamline the machine learning development lifecycle, making it easier for data scientists and engineers to collaborate and deploy models into production.', response_metadata={'prompt_tokens': 229, 'completion_tokens': 104, 'total_tokens': 333}, id='run-d3fb4d06-3e10-4471-83c9-c282cc62b74d-0')
# You can also pass a list of messages
messages = [
("system", "You are a chatbot that can answer questions about Databricks."),
("user", "What is Databricks Model Serving?"),
]
chat_model.invoke(messages)
AIMessage(content='Databricks Model Serving is a feature of the Databricks platform that allows data scientists and engineers to easily deploy machine learning models into production. With Model Serving, you can host, manage, and serve machine learning models as APIs, making it easy to integrate them into applications and business processes. It supports a variety of popular machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn, and provides tools for monitoring and managing the performance of deployed models. Model Serving is designed to be scalable, secure, and easy to use, making it a great choice for organizations that want to quickly and efficiently deploy machine learning models into production.', response_metadata={'prompt_tokens': 35, 'completion_tokens': 130, 'total_tokens': 165}, id='run-b3feea21-223e-4105-8627-41d647d5ccab-0')
链接
与其他聊天模型类似,ChatDatabricks可以用作复杂链的一部分。
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a chatbot that can answer questions about {topic}.",
),
("user", "{question}"),
]
)
chain = prompt | chat_model
chain.invoke(
{
"topic": "Databricks",
"question": "What is Unity Catalog?",
}
)
AIMessage(content="Unity Catalog is a new data catalog feature in Databricks that allows you to discover, manage, and govern all your data assets across your data landscape, including data lakes, data warehouses, and data marts. It provides a centralized repository for storing and managing metadata, data lineage, and access controls for all your data assets. Unity Catalog enables data teams to easily discover and access the data they need, while ensuring compliance with data privacy and security regulations. It is designed to work seamlessly with Databricks' Lakehouse platform, providing a unified experience for managing and analyzing all your data.", response_metadata={'prompt_tokens': 32, 'completion_tokens': 118, 'total_tokens': 150}, id='run-82d72624-f8df-4c0d-a976-919feec09a55-0')
调用 (流式处理)
for chunk in chat_model.stream("How are you?"):
print(chunk.content, end="|")
I|'m| an| AI| and| don|'t| have| feelings|,| but| I|'m| here| and| ready| to| assist| you|.| How| can| I| help| you| today|?||
异步调用
import asyncio
country = ["Japan", "Italy", "Australia"]
futures = [chat_model.ainvoke(f"Where is the capital of {c}?") for c in country]
await asyncio.gather(*futures)
工具调用
ChatDatabricks 支持与 OpenAI 兼容的工具调用 API,该 API 允许您描述工具及其参数,并让模型返回一个 JSON 对象,其中包含要调用的工具以及该工具的输入。工具调用对于构建使用工具的链和代理以及更普遍地从模型获取结构化输出非常有用。
跟ChatDatabricks.bind_tools,我们可以轻松地传入 Pydantic 类、dict 模式、LangChain 工具,甚至作为模型工具的函数。在后台,这些被转换为与 OpenAI 兼容的工具架构,如下所示:
{
"name": "...",
"description": "...",
"parameters": {...} # JSONSchema
}
并传入每个模型调用。
from pydantic import BaseModel, Field
class GetWeather(BaseModel):
"""Get the current weather in a given location"""
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
class GetPopulation(BaseModel):
"""Get the current population in a given location"""
location: str = Field(..., description="The city and state, e.g. San Francisco, CA")
llm_with_tools = chat_model.bind_tools([GetWeather, GetPopulation])
ai_msg = llm_with_tools.invoke(
"Which city is hotter today and which is bigger: LA or NY?"
)
print(ai_msg.tool_calls)
包装自定义模型终端节点
先决条件:
- LLM 已通过 MLflow 注册并部署到 Databricks 服务终结点。终端节点必须具有与 OpenAI 兼容的聊天输入/输出格式(参考)
- 您对终端节点具有 “Can Query” 权限。
终端节点准备就绪后,使用模式与 Foundation Models 的使用模式相同。
chat_model_custom = ChatDatabricks(
endpoint="YOUR_ENDPOINT_NAME",
temperature=0.1,
max_tokens=256,
)
chat_model_custom.invoke("How are you?")
包装外部模型
先决条件:创建代理终端节点
首先,创建一个新的 Databricks 服务终结点,用于将请求代理到目标外部模型。端点创建应该非常快速,以便代理外部模型。
这需要在 Databricks 密钥管理器中注册您的 OpenAI API 密钥,如下所示:
# Replace `<scope>` with your scope
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY
有关如何设置 Databricks CLI 和管理密钥,请参阅 https://docs.databricks.com/en/security/secrets/secrets.html
from mlflow.deployments import get_deploy_client
client = get_deploy_client("databricks")
secret = "secrets/<scope>/openai-api-key" # replace `<scope>` with your scope
endpoint_name = "my-chat" # rename this if my-chat already exists
client.create_endpoint(
name=endpoint_name,
config={
"served_entities": [
{
"name": "my-chat",
"external_model": {
"name": "gpt-3.5-turbo",
"provider": "openai",
"task": "llm/v1/chat",
"openai_config": {
"openai_api_key": "{{" + secret + "}}",
},
},
}
],
},
)
终端节点状态变为 “Ready” 后,您可以采用与其他类型的终端节点相同的方式查询终端节点。
chat_model_external = ChatDatabricks(
endpoint=endpoint_name,
temperature=0.1,
max_tokens=256,
)
chat_model_external.invoke("How to use Databricks?")
Databricks 上的函数调用
Databricks 函数调用与 OpenAI 兼容,并且仅在模型作为基础模型 API 的一部分提供模型期间可用。
有关支持的模型,请参阅 Databricks 函数调用简介。
llm = ChatDatabricks(endpoint="databricks-meta-llama-3-70b-instruct")
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
},
},
}
]
# supported tool_choice values: "auto", "required", "none", function name in string format,
# or a dictionary as {"type": "function", "function": {"name": <<tool_name>>}}
model = llm.bind_tools(tools, tool_choice="auto")
messages = [{"role": "user", "content": "What is the current temperature of Chicago?"}]
print(model.invoke(messages))
请参阅 Databricks Unity Catalog,了解如何在链中使用 UC 函数。
API 参考
有关所有 ChatDatabricks 功能和配置的详细文档,请转到 API 参考:https://api-docs.databricks.com/python/databricks-ai-bridge/latest/databricks_langchain.html#databricks_langchain.ChatDatabricks