Skip to main content
Open In Colab在 GitHub 上打开

RunPod 聊天模型

开始使用 RunPod 聊天模型。

概述

本指南介绍如何使用 LangChainChatRunPod类与 RunPod Serverless 上托管的聊天模型进行交互。

设置

  1. 安装软件包:
    pip install -qU langchain-runpod
  2. 部署聊天模型终端节点:按照 RunPod 提供商指南中的设置步骤在 RunPod Serverless 上部署兼容的聊天模型终端节点并获取其终端节点 ID。
  3. 设置环境变量:确保RUNPOD_API_KEYRUNPOD_ENDPOINT_ID(或特定的RUNPOD_CHAT_ENDPOINT_ID) 设置。
import getpass
import os

# Make sure environment variables are set (or pass them directly to ChatRunPod)
if "RUNPOD_API_KEY" not in os.environ:
os.environ["RUNPOD_API_KEY"] = getpass.getpass("Enter your RunPod API Key: ")

if "RUNPOD_ENDPOINT_ID" not in os.environ:
os.environ["RUNPOD_ENDPOINT_ID"] = input(
"Enter your RunPod Endpoint ID (used if RUNPOD_CHAT_ENDPOINT_ID is not set): "
)

# Optionally use a different endpoint ID specifically for chat models
# if "RUNPOD_CHAT_ENDPOINT_ID" not in os.environ:
# os.environ["RUNPOD_CHAT_ENDPOINT_ID"] = input("Enter your RunPod Chat Endpoint ID (Optional): ")

chat_endpoint_id = os.environ.get(
"RUNPOD_CHAT_ENDPOINT_ID", os.environ.get("RUNPOD_ENDPOINT_ID")
)
if not chat_endpoint_id:
raise ValueError(
"No RunPod Endpoint ID found. Please set RUNPOD_ENDPOINT_ID or RUNPOD_CHAT_ENDPOINT_ID."
)

实例

初始化ChatRunPod类。您可以通过以下方式传递特定于模型的参数model_kwargs并配置轮询行为。

from langchain_runpod import ChatRunPod

chat = ChatRunPod(
runpod_endpoint_id=chat_endpoint_id, # Specify the correct endpoint ID
model_kwargs={
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.9,
# Add other parameters supported by your endpoint handler
},
# Optional: Adjust polling
# poll_interval=0.2,
# max_polling_attempts=150
)

调用

使用标准的 LangChain.invoke().ainvoke()方法来调用模型。还支持通过以下方式进行流式处理.stream().astream()(通过轮询 RunPod 进行模拟/stream端点)。

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
SystemMessage(content="You are a helpful AI assistant."),
HumanMessage(content="What is the RunPod Serverless API flow?"),
]

# Invoke (Sync)
try:
response = chat.invoke(messages)
print("--- Sync Invoke Response ---")
print(response.content)
except Exception as e:
print(
f"Error invoking Chat Model: {e}. Ensure endpoint ID/API key are correct and endpoint is active/compatible."
)

# Stream (Sync, simulated via polling /stream)
print("\n--- Sync Stream Response ---")
try:
for chunk in chat.stream(messages):
print(chunk.content, end="", flush=True)
print() # Newline
except Exception as e:
print(
f"\nError streaming Chat Model: {e}. Ensure endpoint handler supports streaming output format."
)

### Async Usage

# AInvoke (Async)
try:
async_response = await chat.ainvoke(messages)
print("--- Async Invoke Response ---")
print(async_response.content)
except Exception as e:
print(f"Error invoking Chat Model asynchronously: {e}.")

# AStream (Async)
print("\n--- Async Stream Response ---")
try:
async for chunk in chat.astream(messages):
print(chunk.content, end="", flush=True)
print() # Newline
except Exception as e:
print(
f"\nError streaming Chat Model asynchronously: {e}. Ensure endpoint handler supports streaming output format.\n"
)
API 参考:HumanMessage | SystemMessage 系统

链接

聊天模型与 LangChain 表达式语言 (LCEL) 链无缝集成。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
("system", "You are a helpful assistant."),
("human", "{input}"),
]
)

parser = StrOutputParser()

chain = prompt | chat | parser

try:
chain_response = chain.invoke(
{"input": "Explain the concept of serverless computing in simple terms."}
)
print("--- Chain Response ---")
print(chain_response)
except Exception as e:
print(f"Error running chain: {e}")


# Async chain
try:
async_chain_response = await chain.ainvoke(
{"input": "What are the benefits of using RunPod for AI/ML workloads?"}
)
print("--- Async Chain Response ---")
print(async_chain_response)
except Exception as e:
print(f"Error running async chain: {e}")

模型功能(取决于端点)

高级功能的可用性在很大程度上取决于 RunPod 端点处理程序的具体实现。这ChatRunPod集成提供了基本框架,但处理程序必须支持底层功能。

特征集成支持依赖于端点?笔记
Tool callingRequires handler to process tool definitions and return tool calls (e.g., OpenAI format). Integration needs parsing logic.
Structured outputRequires handler support for forcing structured output (JSON mode, function calling). Integration needs parsing logic.
JSON modeRequires handler to accept a json_mode parameter (or similar) and guarantee JSON output.
Image inputRequires multimodal handler accepting image data (e.g., base64). Integration does not support multimodal messages.
Audio inputRequires handler accepting audio data. Integration does not support audio messages.
Video inputRequires handler accepting video data. Integration does not support video messages.
Token-level streaming✅ (Simulated)Polls /stream. Requires handler to populate stream list in status response with token chunks (e.g., [{"output": "token"}]). True low-latency streaming not built-in.
Native asyncCore ainvoke/astream implemented. Relies on endpoint handler performance.
Token usageRequires handler to return prompt_tokens, completion_tokens in the final response. Integration currently does not parse this.
LogprobsRequires handler to return log probabilities. Integration currently does not parse this.

关键要点:如果终端节点遵循基本的 RunPod API 约定,则标准聊天调用和模拟流式处理可以正常工作。高级功能需要特定的处理程序实现,并且可能需要扩展或自定义此集成包。

API 参考

有关ChatRunPod类、参数和方法,请参阅源代码或生成的 API 参考(如果可用)。

源代码链接: https://github.com/runpod/langchain-runpod/blob/main/langchain_runpod/chat_models.py