Skip to main content
Open In ColabOpen on GitHub

LlamaEdge

LlamaEdge 允许您在本地和通过聊天服务与 GGUF 格式的语言模型进行对话。

  • LlamaEdgeChatService 为开发者提供了一个兼容 OpenAI API 的服务,通过 HTTP 请求与大语言模型(LLMs)进行对话。

  • LlamaEdgeChatLocal 允许开发者与本地的LLM进行聊天(即将推出)。

Both LlamaEdgeChatService and LlamaEdgeChatLocal run on the infrastructure driven by WasmEdge Runtime, which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.

通过API服务聊天

LlamaEdgeChatService 基于 llama-api-server。按照 llama-api-server 快速入门指南 中的步骤,您可以托管自己的 API 服务,这样您就可以在任何设备上的任何地方与您喜欢的任意模型进行对话,只要互联网可用即可。

from langchain_community.chat_models.llama_edge import LlamaEdgeChatService
from langchain_core.messages import HumanMessage, SystemMessage

以非流模式与大语言模型对话

# service url
service_url = "https://b008-54-186-154-209.ngrok-free.app"

# create wasm-chat service instance
chat = LlamaEdgeChatService(service_url=service_url)

# create message sequence
system_message = SystemMessage(content="You are an AI assistant")
user_message = HumanMessage(content="What is the capital of France?")
messages = [system_message, user_message]

# chat with wasm-chat service
response = chat.invoke(messages)

print(f"[Bot] {response.content}")
[Bot] Hello! The capital of France is Paris.

使用流式模式与LLM进行对话

# service url
service_url = "https://b008-54-186-154-209.ngrok-free.app"

# create wasm-chat service instance
chat = LlamaEdgeChatService(service_url=service_url, streaming=True)

# create message sequence
system_message = SystemMessage(content="You are an AI assistant")
user_message = HumanMessage(content="What is the capital of Norway?")
messages = [
system_message,
user_message,
]

output = ""
for chunk in chat.stream(messages):
# print(chunk.content, end="", flush=True)
output += chunk.content

print(f"[Bot] {output}")
[Bot]   Hello! I'm happy to help you with your question. The capital of Norway is Oslo.