Skip to main content

如何按相似性选择示例

前提条件

本指南假定您熟悉以下概念:

该对象根据与输入的相似性来选择示例。 它通过查找与输入具有最高余弦相似度的嵌入示例来实现这一点。

示例对象中的每个字段将用作参数以格式化传递给 FewShotPromptTemplateexamplePrompt。 因此,每个示例都应包含您正在使用的示例提示所需的所有字段。

:::提示 请参阅安装集成包的一般说明部分。 :::

npm install @langchain/openai @langchain/community @langchain/core
import { OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(
"Input: {input}\nOutput: {output}"
);

// Create a SemanticSimilarityExampleSelector that will be used to select the examples.
const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples(
[
{ input: "happy", output: "sad" },
{ input: "tall", output: "short" },
{ input: "energetic", output: "lethargic" },
{ input: "sunny", output: "gloomy" },
{ input: "windy", output: "calm" },
],
new OpenAIEmbeddings(),
HNSWLib,
{ k: 1 }
);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: "Give the antonym of every input",
suffix: "Input: {adjective}\nOutput:",
inputVariables: ["adjective"],
});

// Input is about the weather, so should select eg. the sunny/gloomy example
console.log(await dynamicPrompt.format({ adjective: "rainy" }));
/*
Give the antonym of every input

Input: sunny
Output: gloomy

Input: rainy
Output:
*/

// Input is a measurement, so should select the tall/short example
console.log(await dynamicPrompt.format({ adjective: "large" }));
/*
Give the antonym of every input

Input: tall
Output: short

Input: large
Output:
*/

API Reference:

默认情况下,示例对象中的每个字段会被连接在一起,进行嵌入,并存储在向量存储中,以便后续对用户查询进行相似性搜索。

如果您只想嵌入特定的键 (例如,您只想搜索与用户提供查询相似的示例),可以在最终的 options 参数中传递一个 inputKeys 数组。

从现有向量存储加载

您也可以通过将实例直接传递给 SemanticSimilarityExampleSelector 构造函数来使用预初始化的向量存储,如下所示。您还可以通过 addExample 方法添加更多示例:

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
{
query: "healthy food",
output: `galbi`,
},
{
query: "healthy food",
output: `schnitzel`,
},
{
query: "foo",
output: `bar`,
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question: {query}",
inputVariables: ["query"],
});

const formattedValue = await dynamicPrompt.format({
query: "What is a healthy food?",
});
console.log(formattedValue);

/*
Answer the user's question, using the below examples as reference:

<example>
<user_input>
healthy
</user_input>
<output>
galbi
</output>
</example>

<example>
<user_input>
healthy
</user_input>
<output>
schnitzel
</output>
</example>

User question: What is a healthy food?
*/

const model = new ChatOpenAI({
model: "gpt-4o-mini",
});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({ query: "What is a healthy food?" });
console.log(result);
/*
AIMessage {
content: 'A healthy food can be galbi or schnitzel.',
additional_kwargs: { function_call: undefined }
}
*/

API Reference:

元数据过滤

添加示例时,每个字段在生成的文档中都可作为元数据使用。如果您希望对搜索空间有更多控制,可以在示例中添加额外字段,并在初始化选择器时传递一个 filter 参数:

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { Document } from "@langchain/core/documents";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
// Filter type will depend on your specific vector store.
// See the section of the docs for the specific vector store you are using.
filter: (doc: Document) => doc.metadata.food_type === "vegetable",
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});

const model = new ChatOpenAI({
model: "gpt-4o-mini",
});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});
console.log(result);
/*
AIMessage {
content: 'One type of healthy food is lettuce.',
additional_kwargs: { function_call: undefined }
}
*/

API Reference:

自定义向量存储检索器

您还可以传递一个向量存储检索器而不是向量存储。一种有用的方式是如果您希望使用除相似性搜索之外的检索方法,例如最大边缘相关性:

/* eslint-disable @typescript-eslint/no-non-null-assertion */

// Requires a vectorstore that supports maximal marginal relevance search
import { Pinecone } from "@pinecone-database/pinecone";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX!);

/**
* Pinecone allows you to partition the records in an index into namespaces.
* Queries and other operations are then limited to one namespace,
* so different requests can search different subsets of your index.
* Read more about namespaces here: https://docs.pinecone.io/guides/indexes/use-namespaces
*
* NOTE: If you have namespace enabled in your Pinecone index, you must provide the namespace when creating the PineconeStore.
*/
const namespace = "pinecone";

const pineconeVectorstore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{ pineconeIndex, namespace }
);

const pineconeMmrRetriever = pineconeVectorstore.asRetriever({
searchType: "mmr",
k: 2,
});

const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStoreRetriever: pineconeMmrRetriever,
// Only embed the "query" key of each example
inputKeys: ["query"],
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});

const model = new ChatOpenAI({
model: "gpt-4o-mini",
});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});

console.log(result);

/*
AIMessage {
content: 'lettuce.',
additional_kwargs: { function_call: undefined }
}
*/

API Reference:

下一步

您现在已经了解了一些关于在示例选择器中使用相似性的知识。

接下来,请查看这篇关于如何使用基于长度的示例选择器的指南。


Was this page helpful?


You can also leave detailed feedback on GitHub.