构建一个提取链

前提条件

本指南假设您熟悉以下概念：

在本教程中，我们将构建一个链来从非结构化文本中提取结构化信息。

info

本教程仅适用于支持函数/工具调用的模型。

准备工作

安装

要安装 LangChain，请运行以下命令：

npm
yarn
pnpm

npm i langchain @langchain/core

yarn add langchain @langchain/core

pnpm add langchain @langchain/core

有关更多细节，请参阅我们的安装指南。

LangSmith

使用 LangChain 构建的许多应用程序将包含多个步骤以及多次 LLM 调用。随着这些应用程序变得越来越复杂，能够检查链或代理内部发生的情况变得至关重要。实现此目的的最佳方式是使用 LangSmith。

在上方链接注册后，请确保设置您的环境变量以开始记录跟踪信息：

export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."

# 如果您不在无服务器环境中，可减少跟踪延迟
# export LANGCHAIN_CALLBACKS_BACKGROUND=true

模式

首先，我们需要描述想要从文本中提取的信息。

我们将使用 Zod 来定义一个提取个人信息的示例模式。

npm
yarn
pnpm

npm i zod @langchain/core

yarn add zod @langchain/core

pnpm add zod @langchain/core

import { z } from "zod";

const personSchema = z.object({
  name: z.nullish(z.string()).describe("人的名字"),
  hair_color: z.nullish(z.string()).describe("人的头发颜色（如果已知）"),
  height_in_meters: z.nullish(z.string()).describe("以米为单位的身高"),
});

定义模式时有两个最佳实践：

记录属性和模式本身：这些信息会发送给 LLM，用于提高信息提取的质量。
不要强迫 LLM 编造信息！上面我们使用了 .nullish() 来允许 LLM 在不知道答案时输出 null 或 undefined。

info

为了获得最佳性能，请良好记录模式，并确保模型在文本中没有可提取信息时不要被迫返回结果。

提取器

让我们使用上面定义的模式创建一个信息提取器。

import { ChatPromptTemplate } from "@langchain/core/prompts";

// 定义自定义提示词以提供指令和任何附加上下文。
// 1) 您可以在提示模板中添加示例以提高提取质量
// 2) 引入额外参数以考虑上下文（例如，包含
//    从文档中提取文本的元数据）。
const promptTemplate = ChatPromptTemplate.fromMessages([
  [
    "system",
    `您是一个专业的提取算法。
仅提取文本中的相关信息。
如果您不知道要提取属性的值，
请为该属性的值返回 null。`,
  ],
  // 请参阅如何通过参考示例提高性能的指南。
  // ["placeholder", "{examples}"],
  ["human", "{text}"],
]);

我们需要使用支持函数/工具调用的模型。

请查阅文档以了解支持此 API 的一些模型。

Pick your chat model:

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

Add environment variables

GROQ_API_KEY=your-api-key

Instantiate the model

import { ChatGroq } from "@langchain/groq";

const llm = new ChatGroq({
  model: "llama-3.3-70b-versatile",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

Add environment variables

OPENAI_API_KEY=your-api-key

Instantiate the model

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

Add environment variables

ANTHROPIC_API_KEY=your-api-key

Instantiate the model

import { ChatAnthropic } from "@langchain/anthropic";

const llm = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/google-genai

yarn add @langchain/google-genai 

pnpm add @langchain/google-genai 

Add environment variables

GOOGLE_API_KEY=your-api-key

Instantiate the model

import { ChatGoogleGenerativeAI } from "@langchain/google-genai";

const llm = new ChatGoogleGenerativeAI({
  model: "gemini-2.0-flash",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community 

pnpm add @langchain/community 

Add environment variables

FIREWORKS_API_KEY=your-api-key

Instantiate the model

import { ChatFireworks } from "@langchain/community/chat_models/fireworks";

const llm = new ChatFireworks({
  model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

Add environment variables

MISTRAL_API_KEY=your-api-key

Instantiate the model

import { ChatMistralAI } from "@langchain/mistralai";

const llm = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

Add environment variables

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

Instantiate the model

import { ChatVertexAI } from "@langchain/google-vertexai";

const llm = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});

我们通过使用 .withStructuredOutput 方法创建一个新对象来启用结构化输出：

const structured_llm = llm.withStructuredOutput(personSchema);

然后可以正常调用它：

const prompt = await promptTemplate.invoke({
  text: "Alan Smith 是 6 英尺高，有金发。",
});
await structured_llm.invoke(prompt);

{ name: 'Alan Smith', hair_color: 'blond', height_in_meters: '1.83' }

info

提取是生成式的 🤯

LLM 是生成模型，因此它们可以做一些非常酷的事情，例如即使身高是以英尺给出的，也能正确提取以米为单位的身高！

我们可以在这里查看 LangSmith 的跟踪信息。

尽管我们定义的模式变量名为 personSchema，但 Zod 无法推断此名称，因此不会将其传递给模型。为了帮助 LLM 更好地理解所提供的模式代表什么，您可以也为传递给 withStructuredOutput() 的模式提供一个名称：

const structured_llm2 = llm.withStructuredOutput(personSchema, {
  name: "person",
});

const prompt2 = await promptTemplate.invoke({
  text: "Alan Smith 是 6 英尺高，有金发。",
});
await structured_llm2.invoke(prompt2);

{ name: 'Alan Smith', hair_color: 'blond', height_in_meters: '1.83' }

这在许多情况下可以提高性能。

多个实体

在大多数情况下，您应该提取的是实体列表而不是单个实体。

这可以很容易地通过在 Zod 中嵌套模型来实现。

import { z } from "zod";

const person = z.object({
  name: z.nullish(z.string()).describe("人的名字"),
  hair_color: z.nullish(z.string()).describe("人的头发颜色（如果已知）"),
  height_in_meters: z.nullish(z.number()).describe("以米为单位的身高"),
});

const dataSchema = z.object({
  people: z.array(person).describe("提取的人的相关数据"),
});

info

这里的提取可能并不完美。请继续阅读如何使用参考示例来提高提取质量，并查看指南部分！

const structured_llm3 = llm.withStructuredOutput(dataSchema);
const prompt3 = await promptTemplate.invoke({
  text: "我的名字是 Jeff，我的头发是黑色的，我身高 6 英尺。Anna 的头发颜色和我一样。",
});
await structured_llm3.invoke(prompt3);

{
  people: [
    { name: 'Jeff', hair_color: 'black', height_in_meters: 1.83 },
    { name: 'Anna', hair_color: 'black', height_in_meters: null }
  ]
}

tip

当模式支持提取多个实体时，它也允许模型在文本中没有相关信息时提取零个实体，提供一个空列表。

这通常是一件好事！它允许指定实体的必需属性，而无需强制模型检测该实体。

我们可以在这里查看 LangSmith 的跟踪信息。

下一步

现在您已经了解了使用 LangChain 进行提取的基础知识，接下来可以继续阅读其他操作指南：

添加示例：学习如何使用参考示例来提高性能。
处理长文本：如果文本不适合 LLM 的上下文窗口该怎么办？
使用解析方法：对不支持工具/函数调用的模型使用基于提示词的方法进行提取。

构建一个提取链

准备工作

安装

LangSmith

模式

提取器

Pick your chat model:

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

多个实体

下一步

Was this page helpful?

You can also leave detailed feedback on GitHub.

准备工作​

安装​

LangSmith​

模式​

提取器​

Pick your chat model:

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

多个实体​

下一步​

Was this page helpful?

You can also leave detailed feedback on GitHub.

准备工作

安装

LangSmith

模式

提取器

多个实体

下一步