Skip to main content
Open In ColabOpen on GitHub

Ray Serve

Ray Serve 是一个可扩展的模型服务库,用于构建在线推理 API。Serve 特别适用于系统组合,使您能够使用纯 Python 代码构建包含多个链和业务逻辑的复杂推理服务。

目标是此笔记本

这个笔记本展示了如何将一个 OpenAI 链部署到生产环境中的简单示例。你可以在此基础上扩展,自托管自己的模型,并轻松定义在生产环境中运行模型所需的硬件资源(GPU 和 CPU)。更多关于可用选项包括自动扩展的信息,请参阅 Ray Serve 的 文档

Setup Ray Serve

使用pip install ray[serve]安装ray。

通用骨架

The general skeleton for deploying a service is the following:

# 0: Import ray serve and request from starlette
from ray import serve
from starlette.requests import Request


# 1: Define a Ray Serve deployment.
@serve.deployment
class LLMServe:
def __init__(self) -> None:
# All the initialization code goes here
pass

async def __call__(self, request: Request) -> str:
# You can parse the request here
# and return a response
return "Hello World"


# 2: Bind the model to deployment
deployment = LLMServe.bind()

# 3: Run the deployment
serve.api.run(deployment)
# Shutdown the deployment
serve.api.shutdown()

示例:使用自定义提示部署和OpenAI链

这里获取OpenAI API密钥。通过运行以下代码,您将被要求提供您的API密钥。

from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI
from getpass import getpass

OPENAI_API_KEY = getpass()
@serve.deployment
class DeployLLM:
def __init__(self):
# We initialize the LLM, template and the chain here
llm = OpenAI(openai_api_key=OPENAI_API_KEY)
template = "Question: {question}\n\nAnswer: Let's think step by step."
prompt = PromptTemplate.from_template(template)
self.chain = LLMChain(llm=llm, prompt=prompt)

def _run_chain(self, text: str):
return self.chain(text)

async def __call__(self, request: Request):
# 1. Parse the request
text = request.query_params["text"]
# 2. Run the chain
resp = self._run_chain(text)
# 3. Return the response
return resp["text"]

现在我们可以绑定部署。

# Bind the model to deployment
deployment = DeployLLM.bind()

我们可以在运行部署时指定端口号和主机。

# Example port number
PORT_NUMBER = 8282
# Run the deployment
serve.api.run(deployment, port=PORT_NUMBER)

现在服务已部署在端口 localhost:8282,我们可以通过发送POST请求来获取结果。

import requests

text = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
response = requests.post(f"http://localhost:{PORT_NUMBER}/?text={text}")
print(response.content.decode())