如何进行按用户检索

前提条件

本指南假定您已熟悉以下内容：

检索增强生成

在构建检索应用程序时，您通常需要为多个用户设计和开发。这意味着您可能不仅为一个用户存储数据，而是为许多不同的用户存储数据，而且他们不应该能够看到彼此的数据。因此，您需要能够配置检索链，使其仅检索特定的信息。这通常包括以下两个步骤。

步骤 1：确保您使用的检索器支持多个用户

目前，LangChain 中没有统一的标志或过滤器来实现此目的。相反，每个向量存储和检索器可能有自己的实现方式，并且名称也可能不同（如命名空间、多租户等）。对于向量存储，这通常作为在 similaritySearch 期间传递的关键字参数暴露出来。通过阅读文档或源代码，弄清楚您使用的检索器是否支持多个用户，如果支持，了解如何使用它。

步骤 2：将该参数添加为链的可配置字段

LangChain 的 config 对象会被传递到每个可运行对象中。在这里，您可以将任何您想要的字段添加到 configurable 对象中。稍后，在链内部我们可以提取这些字段。

步骤 3：使用该可配置字段调用链

现在，在运行时您可以使用该可配置字段调用此链。

代码示例

让我们来看一个在代码中具体实现的例子。在此示例中我们将使用 Pinecone。

安裝設定

安裝依賴

:::提示请参阅安装集成包的一般说明部分。 :::

npm
yarn
pnpm

npm i @langchain/pinecone @langchain/openai @langchain/core @pinecone-database/pinecone

yarn add @langchain/pinecone @langchain/openai @langchain/core @pinecone-database/pinecone

pnpm add @langchain/pinecone @langchain/openai @langchain/core @pinecone-database/pinecone

設定環境變數

在此範例中，我們會使用 OpenAI 和 Pinecone：

OPENAI_API_KEY=your-api-key

PINECONE_API_KEY=your-api-key
PINECONE_INDEX=your-index-name

# 選用項目，使用 LangSmith 以獲得最佳的可觀察性
LANGSMITH_API_KEY=your-api-key
LANGSMITH_TRACING=true

# 如果您不在無伺服器環境中，可減少追蹤延遲
# LANGCHAIN_CALLBACKS_BACKGROUND=true

import { OpenAIEmbeddings } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";
import { Document } from "@langchain/core/documents";

const embeddings = new OpenAIEmbeddings();

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX);

/**
 * Pinecone allows you to partition the records in an index into namespaces.
 * Queries and other operations are then limited to one namespace,
 * so different requests can search different subsets of your index.
 * Read more about namespaces here: https://docs.pinecone.io/guides/indexes/use-namespaces
 *
 * NOTE: If you have namespace enabled in your Pinecone index, you must provide the namespace when creating the PineconeStore.
 */
const namespace = "pinecone";

const vectorStore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex, namespace }
);

await vectorStore.addDocuments(
  [new Document({ pageContent: "i worked at kensho" })],
  { namespace: "harrison" }
);

await vectorStore.addDocuments(
  [new Document({ pageContent: "i worked at facebook" })],
  { namespace: "ankush" }
);

[ "77b8f174-9d89-4c6c-b2ab-607fe3913b2d" ]

namespace 的 pinecone 参数可用于分隔文档

// This will only get documents for Ankush
const ankushRetriever = vectorStore.asRetriever({
  filter: {
    namespace: "ankush",
  },
});

await ankushRetriever.invoke("where did i work?");

[ Document { pageContent: "i worked at facebook", metadata: {} } ]

// This will only get documents for Harrison
const harrisonRetriever = vectorStore.asRetriever({
  filter: {
    namespace: "harrison",
  },
});

await harrisonRetriever.invoke("where did i work?");

[ Document { pageContent: "i worked at kensho", metadata: {} } ]

我们现在可以创建将用于执行问答的链。

import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import {
  RunnableBinding,
  RunnableLambda,
  RunnablePassthrough,
} from "@langchain/core/runnables";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";

const template = `Answer the question based only on the following context:
{context}
Question: {question}`;

const prompt = ChatPromptTemplate.fromTemplate(template);

const model = new ChatOpenAI({
  model: "gpt-3.5-turbo-0125",
  temperature: 0,
});

现在我们可以使用可配置的检索器来创建链。它是可配置的因为我们能够定义任意对象，并将其传递给链。随后，我们从中提取可配置对象并将其传递给向量存储。

import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";

const chain = RunnableSequence.from([
  RunnablePassthrough.assign({
    context: async (input: { question: string }, config) => {
      if (!config || !("configurable" in config)) {
        throw new Error("No config");
      }
      const { configurable } = config;
      const documents = await vectorStore
        .asRetriever(configurable)
        .invoke(input.question, config);
      return documents.map((doc) => doc.pageContent).join("\n\n");
    },
  }),
  prompt,
  model,
  new StringOutputParser(),
]);

我们现在可以使用可配置选项调用链。search_kwargs 是配置字段的 id 其值是用于 Pinecone 的搜索 kwargs

await chain.invoke(
  { question: "where did the user work?" },
  { configurable: { filter: { namespace: "harrison" } } }
);

"The user worked at Kensho."

await chain.invoke(
  { question: "where did the user work?" },
  { configurable: { filter: { namespace: "ankush" } } }
);

"The user worked at Facebook."

有关更多可支持多用户的向量存储实现，请参考特定的页面，例如 Milvus。

下一步

现在您已经了解了一种支持从多个用户数据中进行检索的方法。

接下来，请查看有关 RAG 的其他操作指南，例如返回来源。

如何进行按用户检索

代码示例

安裝設定

安裝依賴

設定環境變數

下一步

Was this page helpful?

You can also leave detailed feedback on GitHub.

代码示例​

安裝設定​

安裝依賴​

設定環境變數​

下一步​

Was this page helpful?

You can also leave detailed feedback on GitHub.

代码示例

安裝設定

安裝依賴

設定環境變數

下一步