如何直接将多模态数据传递给模型
预备知识
本指南假定您已熟悉以下概念:
在这里,我们将演示如何直接将多模态输入传递给模型。 目前,我们期望所有输入的格式与OpenAI 要求的格式相同。 对于支持多模态输入的其他模型提供商,我们在类内部添加了逻辑以转换为相应的格式。
在这个示例中,我们将要求模型描述一张图片。
import * as fs from "node:fs/promises";
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});
const imageData = await fs.readFile("../../../../examples/hotdog.jpg");
最常见的支持方式是将图像作为字节字符串传递,并在支持多模态输入的模型中使用包含复杂内容类型的消息进行传递。以下是一个示例:
import { HumanMessage } from "@langchain/core/messages";
const message = new HumanMessage({
content: [
{
type: "text",
text: "what does this image contain?",
},
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageData.toString("base64")}`,
},
},
],
});
const response = await model.invoke([message]);
console.log(response.content);
This image contains a hot dog. It shows a frankfurter or sausage encased in a soft, elongated bread bun. The sausage itself appears to be reddish in color, likely a smoked or cured variety. The bun is a golden-brown color, suggesting it has been lightly toasted or grilled. The hot dog is presented against a plain white background, allowing the details of the iconic American fast food item to be clearly visible.
某些模型提供商支持在类型为"image_url"的内容块中直接使用指向图像的 HTTP
URL:
import { ChatOpenAI } from "@langchain/openai";
const openAIModel = new ChatOpenAI({
model: "gpt-4o",
});
const imageUrl =
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";
const message = new HumanMessage({
content: [
{
type: "text",
text: "describe the weather in this image",
},
{
type: "image_url",
image_url: { url: imageUrl },
},
],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);
The weather in the image appears to be pleasant and clear. The sky is mostly blue with a few scattered clouds, indicating good visibility and no immediate signs of rain. The lighting suggests it’s either morning or late afternoon, with sunlight creating a warm and bright atmosphere. There is no indication of strong winds, as the grass and foliage appear calm and undisturbed. Overall, it looks like a beautiful day, possibly spring or summer, ideal for outdoor activities.
我们还可以传入多张图片。
const message = new HumanMessage({
content: [
{
type: "text",
text: "are these two images the same?",
},
{
type: "image_url",
image_url: {
url: imageUrl,
},
},
{
type: "image_url",
image_url: {
url: imageUrl,
},
},
],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);
Yes, the two images are the same.
下一步
你现在已经了解了如何将多模态数据传递给模型。
接下来,你可以查看我们的多模态工具调用指南。