如何从模型中返回结构化数据

让模型返回符合特定模式的输出通常非常有用。一个常见的使用场景是从任意文本中提取数据，以便插入到传统数据库中或用于其他下游系统。本指南将向你展示几种可以使用的不同策略。

前提条件

本指南假定你已熟悉以下概念：

聊天模型

`.withStructuredOutput()` 方法

模型在底层可以使用多种策略。对于一些最流行的模型提供商，包括 Anthropic、Google VertexAI、Mistral 和 OpenAI，LangChain 实现了一个通用的接口，抽象了这些策略，称为 .withStructuredOutput。

通过调用此方法（并传入 JSON schema 或 Zod schema），模型将自动添加必要的模型参数和输出解析器，以获得符合请求模式的结构化输出。如果模型支持多种实现方式（例如，函数调用与 JSON 模式），你可以通过传入相应方法来配置使用哪种方式。

让我们看一些实际示例！我们将使用 Zod 创建一个简单的响应模式。

Pick your chat model:

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

Add environment variables

OPENAI_API_KEY=your-api-key

Instantiate the model

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

Add environment variables

ANTHROPIC_API_KEY=your-api-key

Instantiate the model

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/google-genai

yarn add @langchain/google-genai 

pnpm add @langchain/google-genai 

Add environment variables

GOOGLE_API_KEY=your-api-key

Instantiate the model

import { ChatGoogleGenerativeAI } from "@langchain/google-genai";

const model = new ChatGoogleGenerativeAI({
  model: "gemini-2.0-flash",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

Add environment variables

MISTRAL_API_KEY=your-api-key

Instantiate the model

import { ChatMistralAI } from "@langchain/mistralai";

const model = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

Add environment variables

GROQ_API_KEY=your-api-key

Instantiate the model

import { ChatGroq } from "@langchain/groq";

const model = new ChatGroq({
  model: "llama-3.3-70b-versatile",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

Add environment variables

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

Instantiate the model

import { ChatVertexAI } from "@langchain/google-vertexai";

const model = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});

import { z } from "zod";

const joke = z.object({
  setup: z.string().describe("The setup of the joke"),
  punchline: z.string().describe("The punchline to the joke"),
  rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});

const structuredLlm = model.withStructuredOutput(joke);

await structuredLlm.invoke("Tell me a joke about cats");

{
  setup: "Why don't cats play poker in the wild?",
  punchline: "Too many cheetahs.",
  rating: 7
}

一个关键点是，尽管我们将 Zod 模式设置为名为joke的变量，但 Zod 无法访问该变量名，因此无法将其传递给模型。虽然这不是必需的，但我们可以为模式传递一个名称，以便向模型提供更多关于该模式所代表内容的上下文，从而提升性能：

const structuredLlm = model.withStructuredOutput(joke, { name: "joke" });

await structuredLlm.invoke("Tell me a joke about cats");

{
  setup: "Why don't cats play poker in the wild?",
  punchline: "Too many cheetahs!",
  rating: 7
}

结果是一个 JSON 对象。

如果你不想使用 Zod，也可以传入一个 OpenAI 风格的 JSON 模式字典。该对象应包含三个属性：

name：要输出的模式的名称。
description：对要输出的模式的高层描述。
parameters：你想要提取的模式的嵌套细节，格式为JSON 模式字典。

在这种情况下，响应也是一个字典：

const structuredLlm = model.withStructuredOutput({
  name: "joke",
  description: "Joke to tell user.",
  parameters: {
    title: "Joke",
    type: "object",
    properties: {
      setup: { type: "string", description: "The setup for the joke" },
      punchline: { type: "string", description: "The joke's punchline" },
    },
    required: ["setup", "punchline"],
  },
});

await structuredLlm.invoke("Tell me a joke about cats", { name: "joke" });

{
  setup: "Why was the cat sitting on the computer?",
  punchline: "Because it wanted to keep an eye on the mouse!"
}

如果你使用 JSON Schema，可以利用其他更复杂的模式描述来实现类似的效果。

如果所选模型支持，你也可以直接使用工具调用，让模型在不同选项间进行选择。这需要更多的解析和设置工作。详见此操作指南。

指定输出方式（高级）

对于支持多种数据输出方式的模型，你可以按如下方式指定首选的输出方式：

const structuredLlm = model.withStructuredOutput(joke, {
  method: "json_mode",
  name: "joke",
});

await structuredLlm.invoke(
  "Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys"
);

{
  setup: "Why don't cats play poker in the jungle?",
  punchline: "Too many cheetahs!"
}

在上面的例子中，我们使用了 OpenAI 的替代 JSON 模式功能，并结合了一个更具体的提示。

关于你选择的模型的具体细节，请查阅其在API 参考页面中的条目。

（高级）原始输出

LLM 在生成结构化输出方面并非完美，特别是当模式变得复杂时。你可以通过传递includeRaw: true来避免抛出异常并自行处理原始输出。这将改变输出格式，使其包含原始消息输出和parsed值（如果解析成功）：

const joke = z.object({
  setup: z.string().describe("The setup of the joke"),
  punchline: z.string().describe("The punchline to the joke"),
  rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});

const structuredLlm = model.withStructuredOutput(joke, {
  includeRaw: true,
  name: "joke",
});

await structuredLlm.invoke("Tell me a joke about cats");

{
  raw: AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "",
      tool_calls: [
        {
          name: "joke",
          args: [Object],
          id: "call_0pEdltlfSXjq20RaBFKSQOeF"
        }
      ],
      invalid_tool_calls: [],
      additional_kwargs: { function_call: undefined, tool_calls: [ [Object] ] },
      response_metadata: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "",
    name: undefined,
    additional_kwargs: {
      function_call: undefined,
      tool_calls: [
        {
          id: "call_0pEdltlfSXjq20RaBFKSQOeF",
          type: "function",
          function: [Object]
        }
      ]
    },
    response_metadata: {
      tokenUsage: { completionTokens: 33, promptTokens: 88, totalTokens: 121 },
      finish_reason: "stop"
    },
    tool_calls: [
      {
        name: "joke",
        args: {
          setup: "Why was the cat sitting on the computer?",
          punchline: "Because it wanted to keep an eye on the mouse!",
          rating: 7
        },
        id: "call_0pEdltlfSXjq20RaBFKSQOeF"
      }
    ],
    invalid_tool_calls: [],
    usage_metadata: { input_tokens: 88, output_tokens: 33, total_tokens: 121 }
  },
  parsed: {
    setup: "Why was the cat sitting on the computer?",
    punchline: "Because it wanted to keep an eye on the mouse!",
    rating: 7
  }
}

提示技术

你还可以提示模型以特定格式输出信息。这种方法依赖于设计良好的提示，并随后解析模型的输出。对于不支持 .with_structured_output() 或其他内置方法的模型，这是唯一的选择。

使用 `JsonOutputParser`

以下示例使用内置的 JsonOutputParser 来解析聊天模型的输出，该模型被提示以匹配给定的 JSON Schema。请注意，我们正在通过解析器上的一个方法，将 format_instructions 直接添加到提示中：

import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";

type Person = {
  name: string;
  height_in_meters: number;
};

type People = {
  people: Person[];
};

const formatInstructions = `Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}

Where people is an array of objects, each with a name and height_in_meters field.
`;

// Set up a parser
const parser = new JsonOutputParser<People>();

// Prompt
const prompt = await ChatPromptTemplate.fromMessages([
  [
    "system",
    "Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
  ],
  ["human", "{query}"],
]).partial({
  format_instructions: formatInstructions,
});

让我们看看发送给模型的信息是什么：

const query = "Anna is 23 years old and she is 6 feet tall";

console.log((await prompt.format({ query })).toString());

System: Answer the user query. Wrap the output in `json` tags
Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}

Where people is an array of objects, each with a name and height_in_meters field.

Human: Anna is 23 years old and she is 6 feet tall

现在让我们调用它：

const chain = prompt.pipe(model).pipe(parser);

await chain.invoke({ query });

{ people: [ { name: "Anna", height_in_meters: 1.83 } ] }

如需深入了解如何使用输出解析器配合提示技术生成结构化输出，请参阅本指南。

自定义解析

您还可以使用LangChain 表达式语言 (LCEL) 创建自定义提示和解析器，通过普通函数来解析模型的输出：

import { AIMessage } from "@langchain/core/messages";
import { ChatPromptTemplate } from "@langchain/core/prompts";

type Person = {
  name: string;
  height_in_meters: number;
};

type People = {
  people: Person[];
};

const schema = `{{ people: [{{ name: "string", height_in_meters: "number" }}] }}`;

// Prompt
const prompt = await ChatPromptTemplate.fromMessages([
  [
    "system",
    `Answer the user query. Output your answer as JSON that
matches the given schema: \`\`\`json\n{schema}\n\`\`\`.
Make sure to wrap the answer in \`\`\`json and \`\`\` tags`,
  ],
  ["human", "{query}"],
]).partial({
  schema,
});

/**
 * Custom extractor
 *
 * Extracts JSON content from a string where
 * JSON is embedded between ```json and ``` tags.
 */
const extractJson = (output: AIMessage): Array<People> => {
  const text = output.content as string;
  // Define the regular expression pattern to match JSON blocks
  const pattern = /```json(.*?)```/gs;

  // Find all non-overlapping matches of the pattern in the string
  const matches = text.match(pattern);

  // Process each match, attempting to parse it as JSON
  try {
    return (
      matches?.map((match) => {
        // Remove the markdown code block syntax to isolate the JSON string
        const jsonStr = match.replace(/```json|```/g, "").trim();
        return JSON.parse(jsonStr);
      }) ?? []
    );
  } catch (error) {
    throw new Error(`Failed to parse: ${output}`);
  }
};

这是发送给模型的提示：

const query = "Anna is 23 years old and she is 6 feet tall";

console.log((await prompt.format({ query })).toString());

System: Answer the user query. Output your answer as JSON that
matches the given schema: ```json
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}
```.
Make sure to wrap the answer in ```json and ``` tags
Human: Anna is 23 years old and she is 6 feet tall

调用它时的效果如下：

import { RunnableLambda } from "@langchain/core/runnables";

const chain = prompt
  .pipe(model)
  .pipe(new RunnableLambda({ func: extractJson }));

await chain.invoke({ query });

[
  { people: [ { name: "Anna", height_in_meters: 1.83 } ] }
]

下一步

现在你已经学习了几种让模型输出结构化数据的方法。

如需进一步学习，请查看本节中的其他操作指南或关于工具调用的概念指南。

如何从模型中返回结构化数据

`.withStructuredOutput()` 方法

Pick your chat model:

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

指定输出方式（高级）

（高级）原始输出

提示技术

使用 `JsonOutputParser`

自定义解析

下一步

Was this page helpful?

You can also leave detailed feedback on GitHub.

.withStructuredOutput() 方法​

Pick your chat model:

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

指定输出方式（高级）​

（高级）原始输出​

提示技术​

使用 JsonOutputParser​

自定义解析​

下一步​

Was this page helpful?

You can also leave detailed feedback on GitHub.

`.withStructuredOutput()` 方法

指定输出方式（高级）

（高级）原始输出

提示技术

使用 `JsonOutputParser`

自定义解析

下一步