如何使用参考示例

预备知识

本指南假定您熟悉以下内容：

提取

通常可以通过向 LLM 提供参考示例来提高提取的质量。

tip

虽然本教程的重点是如何在工具调用模型中使用示例，但此技术具有通用性，也可以适用于 JSON 或其他基于提示的技术。

这次我们将使用 OpenAI 的 GPT-4，因为它对ToolMessages提供了强大的支持：

:::提示请参阅安装集成包的一般说明部分。 :::

npm
yarn
pnpm

npm i @langchain/openai @langchain/core zod uuid

yarn add @langchain/openai @langchain/core zod uuid

pnpm add @langchain/openai @langchain/core zod uuid

接下来我们定义一个提示：

import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from "@langchain/core/prompts";

const SYSTEM_PROMPT_TEMPLATE = `You are an expert extraction algorithm.
Only extract relevant information from the text.
If you do not know the value of an attribute asked to extract, you may omit the attribute's value.`;

// Define a custom prompt to provide instructions and any additional context.
// 1) You can add examples into the prompt template to improve extraction quality
// 2) Introduce additional parameters to take context into account (e.g., include metadata
//    about the document from which the text was extracted.)
const prompt = ChatPromptTemplate.fromMessages([
  ["system", SYSTEM_PROMPT_TEMPLATE],
  // ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
  new MessagesPlaceholder("examples"),
  // ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
  ["human", "{text}"],
]);

测试模板：

import { HumanMessage } from "@langchain/core/messages";

const promptValue = await prompt.invoke({
  text: "this is some text",
  examples: [new HumanMessage("testing 1 2 3")],
});

promptValue.toChatMessages();

[
  SystemMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You are an expert extraction algorithm.\n" +
        "Only extract relevant information from the text.\n" +
        "If you do n"... 87 more characters,
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You are an expert extraction algorithm.\n" +
      "Only extract relevant information from the text.\n" +
      "If you do n"... 87 more characters,
    name: undefined,
    additional_kwargs: {}
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: { content: "testing 1 2 3", additional_kwargs: {} },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "testing 1 2 3",
    name: undefined,
    additional_kwargs: {}
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: { content: "this is some text", additional_kwargs: {} },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "this is some text",
    name: undefined,
    additional_kwargs: {}
  }
]

定义模式

让我们重新使用快速入门中的 people 模式。

import { z } from "zod";

const personSchema = z
  .object({
    name: z.optional(z.string()).describe("The name of the person"),
    hair_color: z
      .optional(z.string())
      .describe("The color of the person's hair, if known"),
    height_in_meters: z
      .optional(z.string())
      .describe("Height measured in meters"),
  })
  .describe("Information about a person.");

const peopleSchema = z.object({
  people: z.array(personSchema),
});

定义参考示例

示例可以定义为输入-输出对的列表。

每个示例包含一个示例 输入 文本和一个示例 输出，用以展示应从文本中提取的内容。

info

下面的示例稍微高级一些——示例的格式需要与所使用的 API 匹配（例如，工具调用或 JSON 模式等）。

在这里，格式化的示例将匹配 OpenAI 工具调用 API 所期望的格式，因为这就是我们所使用的。

为了向模型提供参考示例，我们将模拟一个包含工具成功使用情况的虚假聊天历史记录。由于模型可以选择一次调用多个工具（或多次调用同一个工具），示例的输出是一个数组：

import {
  AIMessage,
  type BaseMessage,
  HumanMessage,
  ToolMessage,
} from "@langchain/core/messages";
import { v4 as uuid } from "uuid";

type OpenAIToolCall = {
  id: string;
  type: "function";
  function: {
    name: string;
    arguments: string;
  };
};

type Example = {
  input: string;
  toolCallOutputs: Record<string, any>[];
};

/**
 * This function converts an example into a list of messages that can be fed into an LLM.
 *
 * This code serves as an adapter that transforms our example into a list of messages
 * that can be processed by a chat model.
 *
 * The list of messages for each example includes:
 *
 * 1) HumanMessage: This contains the content from which information should be extracted.
 * 2) AIMessage: This contains the information extracted by the model.
 * 3) ToolMessage: This provides confirmation to the model that the tool was requested correctly.
 *
 * The inclusion of ToolMessage is necessary because some chat models are highly optimized for agents,
 * making them less suitable for an extraction use case.
 */
function toolExampleToMessages(example: Example): BaseMessage[] {
  const openAIToolCalls: OpenAIToolCall[] = example.toolCallOutputs.map(
    (output) => {
      return {
        id: uuid(),
        type: "function",
        function: {
          // The name of the function right now corresponds
          // to the passed name.
          name: "extract",
          arguments: JSON.stringify(output),
        },
      };
    }
  );
  const messages: BaseMessage[] = [
    new HumanMessage(example.input),
    new AIMessage({
      content: "",
      additional_kwargs: { tool_calls: openAIToolCalls },
    }),
  ];
  const toolMessages = openAIToolCalls.map((toolCall, i) => {
    // Return the mocked successful result for a given tool call.
    return new ToolMessage({
      content: "You have correctly called this tool.",
      tool_call_id: toolCall.id,
    });
  });
  return messages.concat(toolMessages);
}

接下来，让我们定义示例，然后将它们转换为消息格式。

const examples: Example[] = [
  {
    input:
      "The ocean is vast and blue. It's more than 20,000 feet deep. There are many fish in it.",
    toolCallOutputs: [{}],
  },
  {
    input: "Fiona traveled far from France to Spain.",
    toolCallOutputs: [
      {
        name: "Fiona",
      },
    ],
  },
];

const exampleMessages = [];
for (const example of examples) {
  exampleMessages.push(...toolExampleToMessages(example));
}

让我们测试一下这个提示

const promptValueWithExamples = await prompt.invoke({
  text: "this is some text",
  examples: exampleMessages,
});

promptValueWithExamples.toChatMessages();

[
  SystemMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You are an expert extraction algorithm.\n" +
        "Only extract relevant information from the text.\n" +
        "If you do n"... 87 more characters,
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You are an expert extraction algorithm.\n" +
      "Only extract relevant information from the text.\n" +
      "If you do n"... 87 more characters,
    name: undefined,
    additional_kwargs: {}
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "The ocean is vast and blue. It's more than 20,000 feet deep. There are many fish in it.",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "The ocean is vast and blue. It's more than 20,000 feet deep. There are many fish in it.",
    name: undefined,
    additional_kwargs: {}
  },
  AIMessage {
    lc_serializable: true,
    lc_kwargs: { content: "", additional_kwargs: { tool_calls: [ [Object] ] } },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "",
    name: undefined,
    additional_kwargs: {
      tool_calls: [
        {
          id: "8fa4d00d-801f-470e-8737-51ee9dc82259",
          type: "function",
          function: [Object]
        }
      ]
    }
  },
  ToolMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You have correctly called this tool.",
      tool_call_id: "8fa4d00d-801f-470e-8737-51ee9dc82259",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You have correctly called this tool.",
    name: undefined,
    additional_kwargs: {},
    tool_call_id: "8fa4d00d-801f-470e-8737-51ee9dc82259"
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Fiona traveled far from France to Spain.",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "Fiona traveled far from France to Spain.",
    name: undefined,
    additional_kwargs: {}
  },
  AIMessage {
    lc_serializable: true,
    lc_kwargs: { content: "", additional_kwargs: { tool_calls: [ [Object] ] } },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "",
    name: undefined,
    additional_kwargs: {
      tool_calls: [
        {
          id: "14ad6217-fcbd-47c7-9006-82f612e36c66",
          type: "function",
          function: [Object]
        }
      ]
    }
  },
  ToolMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You have correctly called this tool.",
      tool_call_id: "14ad6217-fcbd-47c7-9006-82f612e36c66",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You have correctly called this tool.",
    name: undefined,
    additional_kwargs: {},
    tool_call_id: "14ad6217-fcbd-47c7-9006-82f612e36c66"
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: { content: "this is some text", additional_kwargs: {} },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "this is some text",
    name: undefined,
    additional_kwargs: {}
  }
]

创建一个提取器

在这里，我们将使用 gpt-4 创建一个提取器。

import { ChatOpenAI } from "@langchain/openai";

// We will be using tool calling mode, which
// requires a tool calling capable model.
const llm = new ChatOpenAI({
  // Consider benchmarking with the best model you can to get
  // a sense of the best possible quality.
  model: "gpt-4-0125-preview",
  temperature: 0,
});

// For function/tool calling, we can also supply an name for the schema
// to give the LLM additional context about what it's extracting.
const extractionRunnable = prompt.pipe(
  llm.withStructuredOutput(peopleSchema, { name: "people" })
);

没有示例的情况下 😿

请注意，即使我们使用的是 gpt-4，它在一个 非常简单 的测试用例上也是不可靠的！

我们下面运行了 5 次以强调这一点：

const text = "The solar system is large, but earth has only 1 moon.";

for (let i = 0; i < 5; i++) {
  const result = await extractionRunnable.invoke({
    text,
    examples: [],
  });
  console.log(result);
}

{
  people: [ { name: "earth", hair_color: "grey", height_in_meters: "1" } ]
}
{ people: [ { name: "earth", hair_color: "moon" } ] }
{ people: [ { name: "earth", hair_color: "moon" } ] }
{ people: [ { name: "earth", hair_color: "1 moon" } ] }
{ people: [] }

带示例 😻

参考示例有助于修复错误！

for (let i = 0; i < 5; i++) {
  const result = await extractionRunnable.invoke({
    text,
    // Example messages from above
    examples: exampleMessages,
  });
  console.log(result);
}

{ people: [] }
{ people: [] }
{ people: [] }
{ people: [] }
{ people: [] }

await extractionRunnable.invoke({
  text: "My name is Hair-ison. My hair is black. I am 3 meters tall.",
  examples: exampleMessages,
});

{
  people: [ { name: "Hair-ison", hair_color: "black", height_in_meters: "3" } ]
}

下一步

你现在已了解如何通过少量示例提升抽取质量。

接下来，请查看本节中的其他一些指南，例如关于如何对长文本执行抽取的一些建议。

如何使用参考示例

定义模式

定义参考示例

创建一个提取器

没有示例的情况下 😿

带示例 😻

下一步

Was this page helpful?

You can also leave detailed feedback on GitHub.

定义模式​

定义参考示例​

创建一个提取器​

没有示例的情况下 😿​

带示例 😻​

下一步​

Was this page helpful?

You can also leave detailed feedback on GitHub.

定义模式

定义参考示例

创建一个提取器

没有示例的情况下 😿

带示例 😻

下一步