Huggingface 端点

Hugging Face Hub 是一个平台，拥有超过 120k 个模型、20k 数据集和 50k 个演示应用程序（Spaces），所有模型都是开源和公开可用的，在一个在线平台上，人们可以轻松地协作并共同构建 ML。

这Hugging Face Hub还提供了各种终端节点来构建 ML 应用程序。此示例展示了如何连接到不同的 Endpoints 类型。

特别是，文本生成推理由 Text Generation Inference 提供支持：一个定制的 Rust、Python 和 gRPC 服务器，用于 blazing-faset 文本生成推理。

from langchain_huggingface import HuggingFaceEndpoint

API 参考：HuggingFaceEndpoint

安装和设置

要使用huggingface_hubpython 软件包。

%pip install --upgrade --quiet huggingface_hub

# get a token: https://huggingface.co/docs/api-inference/quicktour#get-your-api-token

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()

import os

os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

准备示例

from langchain_huggingface import HuggingFaceEndpoint

API 参考：HuggingFaceEndpoint

from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

API 参考：LLMChain | 提示模板

question = "Who won the FIFA World Cup in the year 1994? "

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate.from_template(template)

例子

以下是如何访问HuggingFaceEndpoint集成的免费 Serverless Endpoints API。

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.5,
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
)
llm_chain = prompt | llm
print(llm_chain.invoke({"question": question}))

专用终端节点

免费的无服务器 API 可让您立即实施解决方案和迭代，但对于繁重的使用案例，它可能会受到速率限制，因为负载是与其他请求共享的。

对于企业工作负载，最好使用 Inference Endpoints – Dedicated。这样可以访问提供更多灵活性和速度的完全托管式基础设施。这些资源提供持续支持和正常运行时间保证，以及 AutoScaling 等选项

# Set the url to your Inference Endpoint below
your_endpoint_url = "https://fayjubiy2xqn36z0.us-east-1.aws.endpoints.huggingface.cloud"

llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
llm("What did foo say about bar?")

流

from langchain_core.callbacks import StreamingStdOutCallbackHandler
from langchain_huggingface import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
)
llm("What did foo say about bar?", callbacks=[StreamingStdOutCallbackHandler()])

API 参考：StreamingStdOutCallbackHandler | HuggingFaceEndpoint

这同样HuggingFaceEndpoint类可以与为 LLM 提供服务的本地 HuggingFace TGI 实例一起使用。查看 TGI 存储库，了解有关各种硬件（GPU、TPU、Gaudi 等）支持的详细信息。

LLM 概念指南
LLM 操作指南

安装和设置

准备示例

例子

专用终端节点

流

相关