如何在不使用函数调用的情况下进行信息抽取
前提条件
本指南假定您熟悉以下内容:
对于能够很好地遵循提示指令的大型语言模型(LLMs),我们可以不使用函数调用来让其以特定格式输出信息。
这种方法依赖于设计良好的提示,并通过对 LLMs 的输出进行解析,以实现良好的信息抽取效果,但相比函数调用或 JSON 模式,它缺乏一些保障机制。
在这里,我们将使用非常擅长遵循指令的 Claude 模型!更多关于 Anthropic 模型的信息请参见此处。
首先,我们将安装集成包:
:::提示 请参阅安装集成包的一般说明部分。 :::
- npm
- yarn
- pnpm
npm i @langchain/anthropic @langchain/core zod zod-to-json-schema
yarn add @langchain/anthropic @langchain/core zod zod-to-json-schema
pnpm add @langchain/anthropic @langchain/core zod zod-to-json-schema
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
temperature: 0,
});
tip
解析方法同样适用于提取质量的所有考量。
本教程旨在保持简洁,但通常应包含参考示例以提升性能!
使用 StructuredOutputParser
以下示例使用内置的
StructuredOutputParser
来解析聊天模型的输出。我们使用解析器中包含的内置提示格式化指令。
import { z } from "zod";
import { StructuredOutputParser } from "langchain/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
let personSchema = z
.object({
name: z.optional(z.string()).describe("The name of the person"),
hair_color: z
.optional(z.string())
.describe("The color of the person's hair, if known"),
height_in_meters: z
.optional(z.string())
.describe("Height measured in meters"),
})
.describe("Information about a person.");
const parser = StructuredOutputParser.fromZodSchema(personSchema);
const prompt = ChatPromptTemplate.fromMessages([
[
"system",
"Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
],
["human", "{query}"],
]);
const partialedPrompt = await prompt.partial({
format_instructions: parser.getFormatInstructions(),
});
让我们看一下发送给模型的信息内容
const query = "Anna is 23 years old and she is 6 feet tall";
const promptValue = await partialedPrompt.invoke({ query });
console.log(promptValue.toChatMessages());
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "Answer the user query. Wrap the output in `json` tags\n" +
"You must format your output as a JSON value th"... 1444 more characters,
additional_kwargs: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "Answer the user query. Wrap the output in `json` tags\n" +
"You must format your output as a JSON value th"... 1444 more characters,
name: undefined,
additional_kwargs: {}
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: "Anna is 23 years old and she is 6 feet tall",
additional_kwargs: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "Anna is 23 years old and she is 6 feet tall",
name: undefined,
additional_kwargs: {}
}
]
const chain = partialedPrompt.pipe(model).pipe(parser);
await chain.invoke({ query });
{ name: "Anna", hair_color: "", height_in_meters: "1.83" }
自定义解析
你还可以使用 LangChain 和 LCEL 创建自定义提示词和解析器。
你可以使用原始函数来解析模型的输出。
在下面的例子中,我们会将模式作为 JSON Schema
传递给提示词。为了方便起见,我们将使用 Zod 声明我们的模式,然后使用
zod-to-json-schema
工具将其转换为 JSON Schema。
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
personSchema = z
.object({
name: z.optional(z.string()).describe("The name of the person"),
hair_color: z
.optional(z.string())
.describe("The color of the person's hair, if known"),
height_in_meters: z
.optional(z.string())
.describe("Height measured in meters"),
})
.describe("Information about a person.");
const peopleSchema = z.object({
people: z.array(personSchema),
});
const SYSTEM_PROMPT_TEMPLATE = [
"Answer the user's query. You must return your answer as JSON that matches the given schema:",
"```json\n{schema}\n```.",
"Make sure to wrap the answer in ```json and ``` tags. Conform to the given schema exactly.",
].join("\n");
const customParsingPrompt = ChatPromptTemplate.fromMessages([
["system", SYSTEM_PROMPT_TEMPLATE],
["human", "{query}"],
]);
const extractJsonFromOutput = (message) => {
const text = message.content;
// Define the regular expression pattern to match JSON blocks
const pattern = /```json\s*((.|\n)*?)\s*```/gs;
// Find all non-overlapping matches of the pattern in the string
const matches = pattern.exec(text);
if (matches && matches[1]) {
try {
return JSON.parse(matches[1].trim());
} catch (error) {
throw new Error(`Failed to parse: ${matches[1]}`);
}
} else {
throw new Error(`No JSON found in: ${message}`);
}
};
const customParsingQuery = "Anna is 23 years old and she is 6 feet tall";
const customParsingPromptValue = await customParsingPrompt.invoke({
schema: zodToJsonSchema(peopleSchema),
customParsingQuery,
});
customParsingPromptValue.toString();
"System: Answer the user's query. You must return your answer as JSON that matches the given schema:\n"... 170 more characters
const customParsingChain = prompt.pipe(model).pipe(extractJsonFromOutput);
await customParsingChain.invoke({
schema: zodToJsonSchema(peopleSchema),
customParsingQuery,
});
{ name: "Anna", age: 23, height: { feet: 6, inches: 0 } }
下一步
您现在已经了解了如何在不使用工具调用的情况下执行提取操作。
接下来,请查看本节中的其他一些指南,例如如何通过示例提高提取质量的一些技巧。