Llama CPP
仅适用于 Node.js。
该模块基于 node-llama-cpp Node.js 绑定库,用于 llama.cpp,允许你使用本地运行的 LLM。这使得你可以运行更小的量化模型,适合在笔记本电脑环境中使用,非常适合测试和快速验证想法,而无需担心产生费用!
安装配置
你需要安装 node-llama-cpp 模块的主要版本 3 来与本地模型通信。
:::提示 请参阅安装集成包的一般说明部分。 :::
- npm
- Yarn
- pnpm
npm install -S node-llama-cpp@3 @langchain/community @langchain/core
yarn add node-llama-cpp@3 @langchain/community @langchain/core
pnpm add node-llama-cpp@3 @langchain/community @langchain/core
你还需要一个本地的 Llama 3 模型(或者 node-llama-cpp 支持的其他模型)。你需要将该模型的路径传递给 LlamaCpp 模块的参数中(见示例)。
开箱即用的 node-llama-cpp 是为在 macOS 平台上运行而优化的,支持 Apple M 系列处理器的 Metal GPU。如果你需要关闭此功能或需要支持 CUDA 架构,请参考 node-llama-cpp 的文档。
关于如何获取和准备 llama3 模型,请参阅该模块对应版本的文档。
给 LangChain.js 贡献者的提示:如果你想运行与此模块相关的测试,你需要将你的本地模型路径设置在环境变量 LLAMA_PATH 中。
使用方法
基本用法
在这种情况下,我们传入一个封装为消息的提示,并期望得到一个响应。
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";
const model = await ChatLlamaCpp.initialize({ modelPath: llamaPath });
const response = await model.invoke([
new HumanMessage({ content: "My name is John." }),
]);
console.log({ response });
/*
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hello John.',
additional_kwargs: {}
},
lc_namespace: [ 'langchain', 'schema' ],
content: 'Hello John.',
name: undefined,
additional_kwargs: {}
}
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp - HumanMessage from
@langchain/core/messages
系统消息
我们也可以提供系统消息,注意在 llama_cpp 模块中,系统消息将导致创建一个新的会话。
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";
const model = await ChatLlamaCpp.initialize({ modelPath: llamaPath });
const response = await model.invoke([
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect, add 'Arr, m'hearty!' to each sentence."
),
new HumanMessage("Tell me where Llamas come from?"),
]);
console.log({ response });
/*
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: "Arr, m'hearty! Llamas come from the land of Peru.",
additional_kwargs: {}
},
lc_namespace: [ 'langchain', 'schema' ],
content: "Arr, m'hearty! Llamas come from the land of Peru.",
name: undefined,
additional_kwargs: {}
}
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp - SystemMessage from
@langchain/core/messages - HumanMessage from
@langchain/core/messages
链式调用(Chains)
此模块也可以与链式调用一起使用,但请注意,使用更复杂的链可能需要功能更强大的 llama3 版本,例如 70B 的版本。
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { LLMChain } from "langchain/chains";
import { PromptTemplate } from "@langchain/core/prompts";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";
const model = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.5,
});
const prompt = PromptTemplate.fromTemplate(
"What is a good name for a company that makes {product}?"
);
const chain = new LLMChain({ llm: model, prompt });
const response = await chain.invoke({ product: "colorful socks" });
console.log({ response });
/*
{
text: `I'm not sure what you mean by "colorful socks" but here are some ideas:\n` +
'\n' +
'- Sock-it to me!\n' +
'- Socks Away\n' +
'- Fancy Footwear'
}
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp - LLMChain from
langchain/chains - PromptTemplate from
@langchain/core/prompts
流式传输
我们也可以使用 Llama CPP 进行流式传输,可以使用原始的“单提示”字符串:
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";
const model = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.7,
});
const stream = await model.stream("Tell me a short story about a happy Llama.");
for await (const chunk of stream) {
console.log(chunk.content);
}
/*
Once
upon
a
time
,
in
a
green
and
sunny
field
...
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp
或者你可以提供多个消息,注意这会将输入进行处理,并向模型提交一个格式化为 Llama3 的提示。
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";
const llamaCpp = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.7,
});
const stream = await llamaCpp.stream([
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect."
),
new HumanMessage("Tell me about Llamas?"),
]);
for await (const chunk of stream) {
console.log(chunk.content);
}
/*
Ar
rr
r
,
me
heart
y
!
Ye
be
ask
in
'
about
llam
as
,
e
h
?
...
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp - SystemMessage from
@langchain/core/messages - HumanMessage from
@langchain/core/messages
使用 invoke 方法,我们也可以实现流式生成,并使用 signal 来中断生成。
import { ChatLlamaCpp } from "@langchain/community/chat_models/llama_cpp";
import { SystemMessage, HumanMessage } from "@langchain/core/messages";
const llamaPath = "/Replace/with/path/to/your/model/gguf-llama3-Q4_0.bin";
const model = await ChatLlamaCpp.initialize({
modelPath: llamaPath,
temperature: 0.7,
});
const controller = new AbortController();
setTimeout(() => {
controller.abort();
console.log("Aborted");
}, 5000);
await model.invoke(
[
new SystemMessage(
"You are a pirate, responses must be very verbose and in pirate dialect."
),
new HumanMessage("Tell me about Llamas?"),
],
{
signal: controller.signal,
callbacks: [
{
handleLLMNewToken(token) {
console.log(token);
},
},
],
}
);
/*
Once
upon
a
time
,
in
a
green
and
sunny
field
...
Aborted
AbortError
*/
API Reference:
- ChatLlamaCpp from
@langchain/community/chat_models/llama_cpp - SystemMessage from
@langchain/core/messages - HumanMessage from
@langchain/core/messages
相关内容
Related
- Chat model conceptual guide
- Chat model how-to guides