Xorbits Inference (Xinference)

本页演示了如何将 Xinference 与 LangChain 结合使用。

Xinference是一个功能强大且用途广泛的库，旨在为 LLM 提供服务，语音识别模型和多模态模型，甚至在您的笔记本电脑上。使用 Xorbits Inference，您可以毫不费力地部署和服务您的最先进的内置模型，只需一个命令。

安装和设置

Xinference 可以通过 PyPI 的 pip 安装：

pip install "xinference[all]"

LLM

Xinference 支持与 GGML 兼容的各种模型，包括 chatglm、baichuan、whisper、骆马和逆戟鲸。要查看内置模型，请运行以下命令：

xinference list --all

Xinference 的包装器

您可以通过运行以下命令来启动 Xinference 的本地实例：

xinference

您还可以在分布式集群中部署 Xinference。为此，请首先启动 Xinference 监督器在要运行它的服务器上：

xinference-supervisor -H "${supervisor_host}"

然后，在要运行 Xinference 工作程序的其他每个服务器上启动 Xinference 工作程序：

xinference-worker -e "http://${supervisor_host}:9997"

您还可以通过运行以下命令来启动 Xinference 的本地实例：

xinference

一旦 Xinference 运行，就可以通过 CLI 或 Xinference 客户端。

对于本地部署，终端节点将为 http://localhost:9997。

对于集群部署，终端节点将为 http://${supervisor_host}：9997。

然后，您需要启动一个模型。您可以指定模型名称和其他属性包括 model_size_in_billions 和 quantization。您可以使用命令行界面（CLI）做吧。例如

xinference launch -n orca -s 3 -q q4_0

将返回模型 uid。

用法示例：

from langchain_community.llms import Xinference

llm = Xinference(
    server_url="http://0.0.0.0:9997",
    model_uid = {model_uid} # replace model_uid with the model UID return from launching the model
)

llm(
    prompt="Q: where can we visit in the capital of France? A:",
    generate_config={"max_tokens": 1024, "stream": True},
)

API 参考：Xinference

用法

有关更多信息和详细示例，请参阅 xinference LLM 示例

嵌入

Xinference 还支持嵌入查询和文档。有关更详细的演示，请参阅 example for xinference embeddings。

Xinference LangChain 合作伙伴包安装

使用以下命令安装集成包：

pip install langchain-xinference

聊天模型

from langchain_xinference.chat_models import ChatXinference

LLM

from langchain_xinference.llms import Xinference