Skip to main content

如何生成多个查询以检索数据

预备知识

本指南假定您熟悉以下概念:

基于距离的向量数据库检索会将查询嵌入(表示)到高维空间中,并基于“距离”查找相似的嵌入文档。 但是,查询措辞的细微变化或嵌入未能很好地捕捉数据语义时,可能会导致检索结果不同。 有时会通过手动调整提示工程/调优来解决这些问题,但这可能很繁琐。

MultiQueryRetriever 通过使用 LLM 为给定的用户输入查询从不同角度生成多个查询,从而自动完成提示调优的过程。 对于每个查询,它会检索一组相关文档,并对所有查询的结果取唯一并集,以获得更大的潜在相关文档集合。 通过生成同一问题的多个视角,MultiQueryRetriever 可以帮助克服基于距离的检索的一些限制,并获得更丰富的结果集。

开始使用

:::提示 请参阅安装集成包的一般说明部分。 :::

yarn add @langchain/anthropic @langchain/cohere
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { CohereEmbeddings } from "@langchain/cohere";
import { MultiQueryRetriever } from "langchain/retrievers/multi_query";
import { ChatAnthropic } from "@langchain/anthropic";

const embeddings = new CohereEmbeddings();

const vectorstore = await MemoryVectorStore.fromTexts(
[
"Buildings are made out of brick",
"Buildings are made out of wood",
"Buildings are made out of stone",
"Cars are made out of metal",
"Cars are made out of plastic",
"mitochondria is the powerhouse of the cell",
"mitochondria is made of lipids",
],
[{ id: 1 }, { id: 2 }, { id: 3 }, { id: 4 }, { id: 5 }],
embeddings
);

const model = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});

const retriever = MultiQueryRetriever.fromLLM({
llm: model,
retriever: vectorstore.asRetriever(),
});

const query = "What are mitochondria made of?";
const retrievedDocs = await retriever.invoke(query);

/*
Generated queries: What are the components of mitochondria?,What substances comprise the mitochondria organelle? ,What is the molecular composition of mitochondria?
*/

console.log(retrievedDocs);
[
Document {
pageContent: "mitochondria is made of lipids",
metadata: {}
},
Document {
pageContent: "mitochondria is the powerhouse of the cell",
metadata: {}
},
Document {
pageContent: "Buildings are made out of brick",
metadata: { id: 1 }
},
Document {
pageContent: "Buildings are made out of wood",
metadata: { id: 2 }
}
]

自定义

您还可以提供自定义提示,以调整生成的问题类型。 您还可以传递一个自定义的输出解析器,将 LLM 调用的结果解析并拆分为查询列表。

import { LLMChain } from "langchain/chains";
import { pull } from "langchain/hub";
import { BaseOutputParser } from "@langchain/core/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";

type LineList = {
lines: string[];
};

class LineListOutputParser extends BaseOutputParser<LineList> {
static lc_name() {
return "LineListOutputParser";
}

lc_namespace = ["langchain", "retrievers", "multiquery"];

async parse(text: string): Promise<LineList> {
const startKeyIndex = text.indexOf("<questions>");
const endKeyIndex = text.indexOf("</questions>");
const questionsStartIndex =
startKeyIndex === -1 ? 0 : startKeyIndex + "<questions>".length;
const questionsEndIndex = endKeyIndex === -1 ? text.length : endKeyIndex;
const lines = text
.slice(questionsStartIndex, questionsEndIndex)
.trim()
.split("\n")
.filter((line) => line.trim() !== "");
return { lines };
}

getFormatInstructions(): string {
throw new Error("Not implemented.");
}
}

// Default prompt is available at: https://smith.langchain.com/hub/jacob/multi-vector-retriever-german
const prompt: PromptTemplate = await pull(
"jacob/multi-vector-retriever-german"
);

const vectorstore = await MemoryVectorStore.fromTexts(
[
"Gebäude werden aus Ziegelsteinen hergestellt",
"Gebäude werden aus Holz hergestellt",
"Gebäude werden aus Stein hergestellt",
"Autos werden aus Metall hergestellt",
"Autos werden aus Kunststoff hergestellt",
"Mitochondrien sind die Energiekraftwerke der Zelle",
"Mitochondrien bestehen aus Lipiden",
],
[{ id: 1 }, { id: 2 }, { id: 3 }, { id: 4 }, { id: 5 }],
embeddings
);
const model = new ChatAnthropic({});
const llmChain = new LLMChain({
llm: model,
prompt,
outputParser: new LineListOutputParser(),
});
const retriever = new MultiQueryRetriever({
retriever: vectorstore.asRetriever(),
llmChain,
});

const query = "What are mitochondria made of?";
const retrievedDocs = await retriever.invoke(query);

/*
Generated queries: Was besteht ein Mitochondrium?,Aus welchen Komponenten setzt sich ein Mitochondrium zusammen? ,Welche Moleküle finden sich in einem Mitochondrium?
*/

console.log(retrievedDocs);
[
Document {
pageContent: "Mitochondrien bestehen aus Lipiden",
metadata: {}
},
Document {
pageContent: "Mitochondrien sind die Energiekraftwerke der Zelle",
metadata: {}
},
Document {
pageContent: "Gebäude werden aus Stein hergestellt",
metadata: { id: 3 }
},
Document {
pageContent: "Autos werden aus Metall hergestellt",
metadata: { id: 4 }
},
Document {
pageContent: "Gebäude werden aus Holz hergestellt",
metadata: { id: 2 }
},
Document {
pageContent: "Gebäude werden aus Ziegelsteinen hergestellt",
metadata: { id: 1 }
}
]

下一步

现在你已经学习了如何使用 MultiQueryRetriever 通过自动生成的查询来查询向量存储。

查看各个单独部分以深入了解特定的检索器,或者查看关于 RAG 的完整教程,以及本部分以学习如何 在任意数据源上创建自己的自定义检索器


Was this page helpful?


You can also leave detailed feedback on GitHub.