Skip to main content

如何修剪消息

预备知识

本指南假设您熟悉以下概念:

本指南中的方法还需要 @langchain/core>=0.2.8。 有关安装说明,请参见安装指南

所有模型都有有限的上下文窗口,这意味着它们作为输入接受的 token 数量是有限的。如果您有非常长的消息,或者某个链/代理积累了很长的消息历史,则需要管理传递给模型的消息的长度。

trimMessages 工具提供了一些基本策略,用于将消息列表修剪为特定的 token 长度。

获取最后 maxTokens 个 token

要获取消息列表中最后的 maxTokens 个 token,可以设置 strategy: "last"。请注意,对于 tokenCounter,我们可以传入一个函数(更多内容见下文)或一个语言模型(因为语言模型具有消息 token 计数方法)。当您修剪消息是为了适应特定模型的上下文窗口时,传入该模型是合理的选择:

import {
AIMessage,
HumanMessage,
SystemMessage,
trimMessages,
} from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";

const messages = [
new SystemMessage("you're a good assistant, you always respond with a joke."),
new HumanMessage("i wonder why it's called langchain"),
new AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
new HumanMessage("and who is harrison chasing anyways"),
new AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
new HumanMessage("what do you call a speechless parrot"),
];

const trimmed = await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: new ChatOpenAI({ model: "gpt-4" }),
});

console.log(
trimmed
.map((x) =>
JSON.stringify(
{
role: x._getType(),
content: x.content,
},
null,
2
)
)
.join("\n\n")
);
{
"role": "human",
"content": "and who is harrison chasing anyways"
}

{
"role": "ai",
"content": "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
}

{
"role": "human",
"content": "what do you call a speechless parrot"
}

如果我们想要始终保留初始的系统消息,可以指定 includeSystem: true

await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: new ChatOpenAI({ model: "gpt-4" }),
includeSystem: true,
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

如果我们希望允许拆分消息的内容,可以指定 allowPartial: true

await trimMessages(messages, {
maxTokens: 50,
strategy: "last",
tokenCounter: new ChatOpenAI({ model: "gpt-4" }),
includeSystem: true,
allowPartial: true,
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

如果我们需要确保我们的第一条消息(不包括系统消息)始终是特定类型,我们可以指定 startOn

await trimMessages(messages, {
maxTokens: 60,
strategy: "last",
tokenCounter: new ChatOpenAI({ model: "gpt-4" }),
includeSystem: true,
startOn: "human",
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'and who is harrison chasing anyways',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'and who is harrison chasing anyways',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

获取前 maxTokens 个 token

通过指定 strategy: "first",我们可以执行相反的操作,即获取最开始的 maxTokens 个 token:

await trimMessages(messages, {
maxTokens: 45,
strategy: "first",
tokenCounter: new ChatOpenAI({ model: "gpt-4" }),
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: "i wonder why it's called langchain",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "i wonder why it's called langchain",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

编写自定义的 token 计数器

我们可以编写一个自定义的 token 计数器函数,该函数接收一个消息列表并返回一个整数。

import { encodingForModel } from "@langchain/core/utils/tiktoken";
import {
BaseMessage,
HumanMessage,
AIMessage,
ToolMessage,
SystemMessage,
MessageContent,
MessageContentText,
} from "@langchain/core/messages";

async function strTokenCounter(
messageContent: MessageContent
): Promise<number> {
if (typeof messageContent === "string") {
return (await encodingForModel("gpt-4")).encode(messageContent).length;
} else {
if (messageContent.every((x) => x.type === "text" && x.text)) {
return (await encodingForModel("gpt-4")).encode(
(messageContent as MessageContentText[])
.map(({ text }) => text)
.join("")
).length;
}
throw new Error(
`Unsupported message content ${JSON.stringify(messageContent)}`
);
}
}

async function tiktokenCounter(messages: BaseMessage[]): Promise<number> {
let numTokens = 3; // every reply is primed with <|start|>assistant<|message|>
const tokensPerMessage = 3;
const tokensPerName = 1;

for (const msg of messages) {
let role: string;
if (msg instanceof HumanMessage) {
role = "user";
} else if (msg instanceof AIMessage) {
role = "assistant";
} else if (msg instanceof ToolMessage) {
role = "tool";
} else if (msg instanceof SystemMessage) {
role = "system";
} else {
throw new Error(`Unsupported message type ${msg.constructor.name}`);
}

numTokens +=
tokensPerMessage +
(await strTokenCounter(role)) +
(await strTokenCounter(msg.content));

if (msg.name) {
numTokens += tokensPerName + (await strTokenCounter(msg.name));
}
}

return numTokens;
}

await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: tiktokenCounter,
});
[
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

链式调用

trimMessages 可以以命令式(如上所示)或声明式的方式使用,便于与其他组件在链式结构中组合使用

import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";

const llm = new ChatOpenAI({ model: "gpt-4o" });

// Notice we don't pass in messages. This creates
// a RunnableLambda that takes messages as input
const trimmer = trimMessages({
maxTokens: 45,
strategy: "last",
tokenCounter: llm,
includeSystem: true,
});

const chain = trimmer.pipe(llm);
await chain.invoke(messages);
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Thanks! I do try to keep things light. But for a more serious answer, "LangChain" is likely named to reflect its focus on language processing and the way it connects different components or models together—essentially forming a "chain" of linguistic operations. The "Lang" part emphasizes its focus on language, while "Chain" highlights the interconnected workflows it aims to facilitate.',
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Thanks! I do try to keep things light. But for a more serious answer, "LangChain" is likely named to reflect its focus on language processing and the way it connects different components or models together—essentially forming a "chain" of linguistic operations. The "Lang" part emphasizes its focus on language, while "Chain" highlights the interconnected workflows it aims to facilitate.',
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {
tokenUsage: { completionTokens: 77, promptTokens: 59, totalTokens: 136 },
finish_reason: 'stop'
},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: { input_tokens: 59, output_tokens: 77, total_tokens: 136 }
}

查看 LangSmith 追踪,我们可以看到在消息传递给模型之前,它们首先会被裁剪。

仅查看裁剪器时,我们可以看到它是一个 Runnable 对象,可以像所有 Runnable 一样被调用:

await trimmer.invoke(messages);
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

与 ChatMessageHistory 一起使用

处理聊天历史记录 时,修剪消息尤其有用,因为聊天记录可能会变得任意长:

import { InMemoryChatMessageHistory } from "@langchain/core/chat_history";
import { RunnableWithMessageHistory } from "@langchain/core/runnables";
import { HumanMessage, trimMessages } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";

const chatHistory = new InMemoryChatMessageHistory(messages.slice(0, -1));

const dummyGetSessionHistory = async (sessionId: string) => {
if (sessionId !== "1") {
throw new Error("Session not found");
}
return chatHistory;
};

const llm = new ChatOpenAI({ model: "gpt-4o" });

const trimmer = trimMessages({
maxTokens: 45,
strategy: "last",
tokenCounter: llm,
includeSystem: true,
});

const chain = trimmer.pipe(llm);
const chainWithHistory = new RunnableWithMessageHistory({
runnable: chain,
getMessageHistory: dummyGetSessionHistory,
});
await chainWithHistory.invoke(
[new HumanMessage("what do you call a speechless parrot")],
{ configurable: { sessionId: "1" } }
);
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'A "polly-no-want-a-cracker"!',
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'A "polly-no-want-a-cracker"!',
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {
tokenUsage: { completionTokens: 11, promptTokens: 57, totalTokens: 68 },
finish_reason: 'stop'
},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: { input_tokens: 57, output_tokens: 11, total_tokens: 68 }
}

查看LangSmith 跟踪,我们可以看到我们检索到了所有消息,但在消息传递给模型之前,它们会被裁剪,仅保留系统消息和最后一条用户消息。

API 参考

有关所有参数的完整描述,请前往 API 参考


Was this page helpful?


You can also leave detailed feedback on GitHub.