ChatNVIDIA

这将帮助你开始使用NVIDIA 聊天模型。有关所有ChatNVIDIA功能和配置的详细文档，请参阅API参考。

概览

The langchain-nvidia-ai-endpoints 包含 LangChain 组件，用于在 NVIDIA NIM 推断微服务上构建基于模型的应用程序。NIM 支持来自社区以及 NVIDIA 的各个领域（如聊天、嵌入和重排序）的模型。这些模型由 NVIDIA 优化以在 NVIDIA 加速基础设施上提供最佳性能，并部署为一个 NIM，这是一种易于使用的预构建容器，可以在 NVIDIA 加速基础设施上通过单个命令部署到任何位置。

NVIDIA hosted deployments of NIMs are available to test on the NVIDIA API catalog. After testing, NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, giving enterprises ownership and full control of their IP and AI application.

NIMs是以单个模型为基础打包为容器镜像，并通过NVIDIA NGC目录分发为NGC容器镜像。在核心层面，NIMs提供了运行AI模型推理的简单、一致且熟悉的API接口。

此示例介绍了如何使用LangChain与NVIDIA支持的通过ChatNVIDIA类进行交互的方法。

要了解通过此API访问聊天模型的更多信息，请参阅ChatNVIDIA文档。

集成细节

Class	包	本地	序列化	JS支持	Package downloads	Package 最新版本
ChatNVIDIA	langchain_nvidia_ai_endpoints	✅	beta	❌

模型特性

工具调用	结构化输出	JSON 模式	图像输入	音频输入	视频输入	Token级流式传输	原生异步	Token 使用	对数概率
✅	✅	✅	✅	❌	❌	✅	✅	✅	❌

设置

要开始使用:

创建一个免费账户，使用托管NVIDIA AI基础模型的NVIDIA。
点击您喜欢的模型。
在 Input 下选择 Python 选项卡，然后点击 Get API Key。接着点击 Generate Key。
将生成的密钥复制并保存为NVIDIA_API_KEY。之后，您应该可以访问相关的端点。

Credentials

import getpass
import os

if not os.getenv("NVIDIA_API_KEY"):
    # Note: the API key should start with "nvapi-"
    os.environ["NVIDIA_API_KEY"] = getpass.getpass("Enter your NVIDIA API key: ")

要启用对您的模型调用的自动跟踪，请设置您的LangSmith API密钥：

# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")

安装

The LangChain NVIDIA AI Endpoints 整合存在于 langchain_nvidia_ai_endpoints 包中:

%pip install --upgrade --quiet langchain-nvidia-ai-endpoints

Instantiation

现在我们可以访问NVIDIA API目录中的模型:

## Core LC Chat Interface
from langchain_nvidia_ai_endpoints import ChatNVIDIA

llm = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")

API 参考:ChatNVIDIA

Invocation

result = llm.invoke("Write a ballad about LangChain.")
print(result.content)

工作于NVIDIA NIMs

当准备部署时，您可以使用NVIDIA NIM自行托管模型——这包括在NVIDIA AI Enterprise软件许可中，并可以在任何地方运行它们，从而拥有您自定义的内容并完全控制您的知识产权（IP）和AI应用程序。

了解NIMs的更多信息

from langchain_nvidia_ai_endpoints import ChatNVIDIA

# connect to an embedding NIM running at localhost:8000, specifying a specific model
llm = ChatNVIDIA(base_url="http://localhost:8000/v1", model="meta/llama3-8b-instruct")

API 参考:ChatNVIDIA

流式、批量处理和异步处理

这些模型原生支持流式处理，就像所有LangChain大语言模型一样，它们暴露了批处理方法以处理并发请求，以及用于调用、流式传输和批处理的异步方法。下面是几个示例。

print(llm.batch(["What's 2*3?", "What's 2*6?"]))
# Or via the async API
# await llm.abatch(["What's 2*3?", "What's 2*6?"])

for chunk in llm.stream("How far can a seagull fly in one day?"):
    # Show the token separations
    print(chunk.content, end="|")

async for chunk in llm.astream(
    "How long does it take for monarch butterflies to migrate?"
):
    print(chunk.content, end="|")

支持的模型

查询available_models仍将返回您API凭证提供的所有其他模型。

The playground_ 前缀是可选的。

ChatNVIDIA.get_available_models()
# llm.get_available_models()

模型类型

这些模型都得到了支持，并可以通过ChatNVIDIA访问。

一些模型类型支持独特的提示技术和聊天消息。我们将在下面回顾其中几种重要的。

要了解特定模型的更多信息，请导航至的API部分如这里所链接。

一般聊天

模型meta/llama3-8b-instruct和mistralai/mixtral-8x22b-instruct-v0.1是通用型模型，您可以使用它们处理任何LangChain聊天消息。以下是一个示例。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_nvidia_ai_endpoints import ChatNVIDIA

prompt = ChatPromptTemplate.from_messages(
    [("system", "You are a helpful AI assistant named Fred."), ("user", "{input}")]
)
chain = prompt | ChatNVIDIA(model="meta/llama3-8b-instruct") | StrOutputParser()

for txt in chain.stream({"input": "What's your name?"}):
    print(txt, end="")

API 参考:StrOutputParser |聊天提示模板 |ChatNVIDIA

代码生成

这些模型接受与普通聊天模型相同的参数和输入结构，但在代码生成和结构化代码任务上表现更好。一个例子是meta/codellama-70b。

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert coding AI. Respond only in valid python; no narration whatsoever.",
        ),
        ("user", "{input}"),
    ]
)
chain = prompt | ChatNVIDIA(model="meta/codellama-70b") | StrOutputParser()

for txt in chain.stream({"input": "How do I solve this fizz buzz problem?"}):
    print(txt, end="")

多模态

NVIDIA 也支持多模态输入，这意味着您可以同时提供图像和文本供模型推理。一个支持多模态输入的示例模型是 nvidia/neva-22b。

以下是一个示例用法:

import IPython
import requests

image_url = "https://www.nvidia.com/content/dam/en-zz/Solutions/research/ai-playground/nvidia-picasso-3c33-p@2x.jpg"  ## Large Image
image_content = requests.get(image_url).content

IPython.display.Image(image_content)

from langchain_nvidia_ai_endpoints import ChatNVIDIA

llm = ChatNVIDIA(model="nvidia/neva-22b")

API 参考:ChatNVIDIA

将图像作为URL传递

from langchain_core.messages import HumanMessage

llm.invoke(
    [
        HumanMessage(
            content=[
                {"type": "text", "text": "Describe this image:"},
                {"type": "image_url", "image_url": {"url": image_url}},
            ]
        )
    ]
)

API 参考:人类消息

将图像作为base64编码的字符串传递

当前，一些额外的处理会在客户端进行以支持如上所示的大尺寸图片。但对于较小的图片（以及更好地展示底层过程），我们可以直接传递图片，如下所示：

import IPython
import requests

image_url = "https://picsum.photos/seed/kitten/300/200"
image_content = requests.get(image_url).content

IPython.display.Image(image_content)

import base64

from langchain_core.messages import HumanMessage

## Works for simpler images. For larger images, see actual implementation
b64_string = base64.b64encode(image_content).decode("utf-8")

llm.invoke(
    [
        HumanMessage(
            content=[
                {"type": "text", "text": "Describe this image:"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{b64_string}"},
                },
            ]
        )
    ]
)

API 参考:人类消息

直接在字符串内

NVIDIA API 独特地接受以 base64 编码的图片，并将其内联在 <img/> HTML 标签中。尽管这无法与其他大语言模型（LLM）互通，但你可以直接以相应方式向该模型发送提示。

base64_with_mime_type = f"data:image/png;base64,{b64_string}"
llm.invoke(f'What\'s in this image?\n<img src="{base64_with_mime_type}" />')

示例用法在可运行的消息历史记录中

如同其他集成一样，ChatNVIDIA 可以很好地支持诸如 RunnableWithMessageHistory 之类的聊天工具，这类似于使用 ConversationChain。下面，我们展示了将 LangChain RunnableWithMessageHistory 示例应用于 mistralai/mixtral-8x22b-instruct-v0.1 模型的情况。

%pip install --upgrade --quiet langchain

from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# store is a dictionary that maps session IDs to their corresponding chat histories.
store = {}  # memory is maintained outside the chain


# A function that returns the chat history for a given session ID.
def get_session_history(session_id: str) -> InMemoryChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]


chat = ChatNVIDIA(
    model="mistralai/mixtral-8x22b-instruct-v0.1",
    temperature=0.1,
    max_tokens=100,
    top_p=1.0,
)

#  Define a RunnableConfig object, with a `configurable` key. session_id determines thread
config = {"configurable": {"session_id": "1"}}

conversation = RunnableWithMessageHistory(
    chat,
    get_session_history,
)

conversation.invoke(
    "Hi I'm Srijan Dubey.",  # input or query
    config=config,
)

API 参考:内存聊天消息历史 |可运行消息历史

conversation.invoke(
    "I'm doing well! Just having a conversation with an AI.",
    config=config,
)

conversation.invoke(
    "Tell me about yourself.",
    config=config,
)

工具调用

自v0.2版本起，ChatNVIDIA支持bind_tools。

ChatNVIDIA 提供了与 build.nvidia.com 上的各种模型的集成，以及本地 NIMs。并非所有这些模型都训练用于工具调用。请务必选择一个具有工具调用功能的模型以进行实验和应用。

您可以通过以下方式获取已知支持工具调用的模型列表，

tool_models = [
    model for model in ChatNVIDIA.get_available_models() if model.supports_tools
]
tool_models

使用具备工具能力的模型，

from langchain_core.tools import tool
from pydantic import Field


@tool
def get_current_weather(
    location: str = Field(..., description="The location to get the weather for."),
):
    """Get the current weather for a location."""
    ...


llm = ChatNVIDIA(model=tool_models[0].id).bind_tools(tools=[get_current_weather])
response = llm.invoke("What is the weather in Boston?")
response.tool_calls

API 参考:工具

见如何使用聊天模型调用工具的示例。

链式调用

我们可以通过以下方式将模型与提示模板进行链接：

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate(
    [
        (
            "system",
            "You are a helpful assistant that translates {input_language} to {output_language}.",
        ),
        ("human", "{input}"),
    ]
)

chain = prompt | llm
chain.invoke(
    {
        "input_language": "English",
        "output_language": "German",
        "input": "I love programming.",
    }
)

API 参考:ChatPromptTemplate

API 参考

详细文档请参阅所有ChatNVIDIA功能和配置的API参考： https://python.langchain.com/api_reference/nvidia_ai_endpoints/chat_models/langchain_nvidia_ai_endpoints.chat_models.ChatNVIDIA.html

聊天模型概念指南
聊天模型使用指南

概览​

集成细节​

模型特性​

设置​

Credentials​

安装​

Instantiation​

Invocation​

工作于NVIDIA NIMs​

流式、批量处理和异步处理​

支持的模型​

模型类型​

一般聊天​

代码生成​

多模态​

将图像作为URL传递​

将图像作为base64编码的字符串传递​

直接在字符串内​

示例用法在可运行的消息历史记录中​

工具调用​

链式调用​

API 参考​

相关​

概览

集成细节

模型特性

设置

Credentials

安装

Instantiation

Invocation

工作于NVIDIA NIMs

流式、批量处理和异步处理

支持的模型

模型类型

一般聊天

代码生成

多模态

将图像作为URL传递

将图像作为base64编码的字符串传递

直接在字符串内

示例用法在可运行的消息历史记录中

工具调用

链式调用

API 参考

相关