Skip to main content

如何从LangSmith数据集中选择示例

兼容性
  • langsmith >= 0.1.43

LangSmith 数据集内置支持相似性搜索,使其成为构建和查询少样本示例的优秀工具。

在本指南中,我们将了解如何将已建立索引的 LangSmith 数据集用作少样本示例选择器。

准备工作

开始之前,请确保您已创建 LangSmith 账户并设置好凭证:

process.env.LANGSMITH_API_KEY = "your-api-key";
process.env.LANGSMITH_TRACING = "true";

我们需要安装langsmith SDK。在此示例中,我们还将使用langchain@langchain/anthropic

yarn add langsmith langchain @langchain/anthropic @langchain/core zod zod-to-json-schema

现在我们将克隆一个公共数据集,并为该数据集开启索引功能。我们也可以通过 LangSmith UI 开启数据集索引功能。

我们将克隆 Multiverse 数学少样本示例数据集

这将启用对数据集的搜索功能,并确保我们每次更新或添加示例时,它们也会被索引。

创建克隆的第一步是读取包含示例的 JSON 文件,并将其转换为 LangSmith 创建示例所期望的格式:

import { Client as LangSmithClient } from "langsmith";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
import fs from "fs/promises";

// Read the example dataset and convert to the format expected by the LangSmith API
// for creating new examples
const examplesJson = JSON.parse(
await fs.readFile("../../data/ls_few_shot_example_dataset.json", "utf-8")
);

let inputs: Record<string, any>[] = [];
let outputs: Record<string, any>[] = [];
let metadata: Record<string, any>[] = [];

examplesJson.forEach((ex) => {
inputs.push(ex.inputs);
outputs.push(ex.outputs);
metadata.push(ex.metadata);
});

// Define our input schema as this is required for indexing
const inputsSchema = zodToJsonSchema(
z.object({
input: z.string(),
system: z.boolean().optional(),
})
);

const lsClient = new LangSmithClient();

await lsClient.deleteDataset({
datasetName: "multiverse-math-examples-for-few-shot-example",
});

const dataset = await lsClient.createDataset(
"multiverse-math-examples-for-few-shot-example",
{
inputsSchema,
}
);

const createdExamples = await lsClient.createExamples({
inputs,
outputs,
metadata,
datasetId: dataset.id,
});
await lsClient.indexDataset({ datasetId: dataset.id });

数据集建立索引后,我们可以按如下方式搜索类似的示例:

const examples = await lsClient.similarExamples(
{ input: "whats the negation of the negation of the negation of 3" },
dataset.id,
3
);
console.log(examples.length);
3
console.log(examples[0].inputs.input);
evaluate the negation of -100

对于该数据集,输出为完整的聊天记录:

console.log(examples[1].outputs.output);
[
{
id: 'cbe7ed83-86e1-4e46-89de-6646f8b55cef',
type: 'system',
content: 'You are requested to solve math questions in an alternate mathematical universe. The operations have been altered to yield different results than expected. Do not guess the answer or rely on your innate knowledge of math. Use the provided tools to answer the question. While associativity and commutativity apply, distributivity does not. Answer the question using the fewest possible tools. Only include the numeric response without any clarifications.',
additional_kwargs: {},
response_metadata: {}
},
{
id: '04946246-09a8-4465-be95-037efd7dae55',
type: 'human',
content: 'if one gazoink is 4 badoinks, each of which is 6 foos, each of wich is 3 bars - how many bars in 3 gazoinks?',
example: false,
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-d6f0954e-b21b-4ea8-ad98-0ee64cfc824e-0',
type: 'ai',
content: [ [Object] ],
example: false,
tool_calls: [ [Object] ],
usage_metadata: { input_tokens: 916, total_tokens: 984, output_tokens: 68 },
additional_kwargs: {},
response_metadata: {
id: 'msg_01MBWxgouUBzomwTvXhomGVq',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'tool_use',
stop_sequence: null
},
invalid_tool_calls: []
},
{
id: '3d4c72c4-f009-48ce-b739-1d3f28ee4803',
name: 'multiply',
type: 'tool',
content: '13.2',
tool_call_id: 'toolu_016RjRHSEyDZRqKhGrb8uvjJ',
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-26dd7e83-f5fb-4c70-8ba1-271300ffeb25-0',
type: 'ai',
content: [ [Object] ],
example: false,
tool_calls: [ [Object] ],
usage_metadata: { input_tokens: 999, total_tokens: 1070, output_tokens: 71 },
additional_kwargs: {},
response_metadata: {
id: 'msg_01VTFvtCxtR3rN58hCmjt2oH',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'tool_use',
stop_sequence: null
},
invalid_tool_calls: []
},
{
id: 'ca4e0317-7b3a-4638-933c-1efd98bc4fda',
name: 'multiply',
type: 'tool',
content: '87.12',
tool_call_id: 'toolu_01PqvszxiuXrVJ9bwgTWaH3q',
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-007794ac-3590-4b9e-b678-008f02e40042-0',
type: 'ai',
content: [ [Object] ],
example: false,
tool_calls: [ [Object] ],
usage_metadata: { input_tokens: 1084, total_tokens: 1155, output_tokens: 71 },
additional_kwargs: {},
response_metadata: {
id: 'msg_017BEkSqmTsmtJaTxAzfRMEh',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'tool_use',
stop_sequence: null
},
invalid_tool_calls: []
},
{
id: '7f58c121-6f21-4c7b-ba38-aa820e274ff8',
name: 'multiply',
type: 'tool',
content: '287.496',
tool_call_id: 'toolu_01LU3RqRUXZRLRoJ2AZNmPed',
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-51e35afb-7ec6-4738-93e2-92f80b5c9377-0',
type: 'ai',
content: '287.496',
example: false,
tool_calls: [],
usage_metadata: { input_tokens: 1169, total_tokens: 1176, output_tokens: 7 },
additional_kwargs: {},
response_metadata: {
id: 'msg_01Tx9kSNapSg8aUbWZXiS1NL',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'end_turn',
stop_sequence: null
},
invalid_tool_calls: []
}
]

搜索会返回输入与查询输入最相似的示例。我们可以用它来进行少量样本提示模型。第一步是创建一系列我们允许模型调用的数学工具:

import { tool } from "@langchain/core/tools";
import { z } from "zod";

const add = tool(
(input) => {
return (input.a + input.b).toString();
},
{
name: "add",
description: "Add two numbers",
schema: z.object({
a: z.number().describe("The first number to add"),
b: z.number().describe("The second number to add"),
}),
}
);

const cos = tool(
(input) => {
return Math.cos(input.angle).toString();
},
{
name: "cos",
description: "Calculate the cosine of an angle (in radians)",
schema: z.object({
angle: z.number().describe("The angle in radians"),
}),
}
);

const divide = tool(
(input) => {
return (input.a / input.b).toString();
},
{
name: "divide",
description: "Divide two numbers",
schema: z.object({
a: z.number().describe("The dividend"),
b: z.number().describe("The divisor"),
}),
}
);

const log = tool(
(input) => {
return Math.log(input.value).toString();
},
{
name: "log",
description: "Calculate the natural logarithm of a number",
schema: z.object({
value: z.number().describe("The number to calculate the logarithm of"),
}),
}
);

const multiply = tool(
(input) => {
return (input.a * input.b).toString();
},
{
name: "multiply",
description: "Multiply two numbers",
schema: z.object({
a: z.number().describe("The first number to multiply"),
b: z.number().describe("The second number to multiply"),
}),
}
);

const negate = tool(
(input) => {
return (-input.a).toString();
},
{
name: "negate",
description: "Negate a number",
schema: z.object({
a: z.number().describe("The number to negate"),
}),
}
);

const pi = tool(
() => {
return Math.PI.toString();
},
{
name: "pi",
description: "Return the value of pi",
schema: z.object({}),
}
);

const power = tool(
(input) => {
return Math.pow(input.base, input.exponent).toString();
},
{
name: "power",
description: "Raise a number to a power",
schema: z.object({
base: z.number().describe("The base number"),
exponent: z.number().describe("The exponent"),
}),
}
);

const sin = tool(
(input) => {
return Math.sin(input.angle).toString();
},
{
name: "sin",
description: "Calculate the sine of an angle (in radians)",
schema: z.object({
angle: z.number().describe("The angle in radians"),
}),
}
);

const subtract = tool(
(input) => {
return (input.a - input.b).toString();
},
{
name: "subtract",
description: "Subtract two numbers",
schema: z.object({
a: z.number().describe("The number to subtract from"),
b: z.number().describe("The number to subtract"),
}),
}
);
import { ChatOpenAI } from "@langchain/openai";
import {
HumanMessage,
SystemMessage,
BaseMessage,
BaseMessageLike,
} from "@langchain/core/messages";
import { RunnableLambda } from "@langchain/core/runnables";
import { Client as LangSmithClient, Example } from "langsmith";
import { coerceMessageLikeToMessage } from "@langchain/core/messages";

const client = new LangSmithClient();

async function similarExamples(
input: Record<string, any>
): Promise<Record<string, any>> {
const examples = await client.similarExamples(input, dataset.id, 5);
return { ...input, examples };
}

function constructPrompt(input: {
examples: Example[];
input: string;
}): BaseMessage[] {
const instructions = "You are great at using mathematical tools.";
let messages: BaseMessage[] = [];

for (const ex of input.examples) {
// Assuming ex.outputs.output is an array of message-like objects
messages = messages.concat(
ex.outputs.output.flatMap((msg: BaseMessageLike) =>
coerceMessageLikeToMessage(msg)
)
);
}

const examples = messages.filter((msg) => msg._getType() !== "system");
examples.forEach((ex) => {
if (ex._getType() === "human") {
ex.name = "example_user";
} else {
ex.name = "example_assistant";
}
});

return [
new SystemMessage(instructions),
...examples,
new HumanMessage(input.input),
];
}

const llm = new ChatOpenAI({
model: "gpt-4o",
temperature: 0,
});
const tools = [
add,
cos,
divide,
log,
multiply,
negate,
pi,
power,
sin,
subtract,
];
const llmWithTools = llm.bindTools(tools);

const exampleSelector = new RunnableLambda({
func: similarExamples,
}).withConfig({ runName: "similarExamples" });

const chain = exampleSelector
.pipe(
new RunnableLambda({
func: constructPrompt,
}).withConfig({
runName: "constructPrompt",
})
)
.pipe(llmWithTools);
const aiMsg = await chain.invoke({
input: "whats the negation of the negation of 3",
system: false,
});
console.log(aiMsg.tool_calls);
[
{
name: 'negate',
args: { a: 3 },
type: 'tool_call',
id: 'call_SX0dmb4AbFu39KkGQDqPXQwa'
}
]

查看 LangSmith 的追踪信息,我们可以看到相关的示例是在similarExamples步骤中被调入,并作为消息传递给了 ChatOpenAI:https://smith.langchain.com/public/20e09618-0746-4973-9382-5b36c3f27083/r。


Was this page helpful?


You can also leave detailed feedback on GitHub.