如何使用向量存储来检索数据

前提条件

本指南假设您熟悉以下概念：

可以使用 asRetriever() 方法将向量存储转换为检索器，这样可以更方便地在链中组合使用它们。

下面展示了一个检索增强生成（RAG）链，它通过以下步骤使用文档进行问答：

初始化一个向量存储
从该向量存储创建一个检索器
组建一个问答链
提问！

每个步骤都包含多个子步骤和潜在的配置选项，但我们先演示一个常见的流程。首先，安装所需的依赖：

npm
Yarn
pnpm

npm install @langchain/openai @langchain/core

yarn add @langchain/openai @langchain/core

pnpm add @langchain/openai @langchain/core

你可以点击此处下载 state_of_the_union.txt 文件。

import * as fs from "node:fs";

import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import type { Document } from "@langchain/core/documents";

const formatDocumentsAsString = (documents: Document[]) => {
  return documents.map((document) => document.pageContent).join("\n\n");
};

// Initialize the LLM to use to answer the question.
const model = new ChatOpenAI({
  model: "gpt-4o-mini",
});
const text = fs.readFileSync("state_of_the_union.txt", "utf8");
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);
// Create a vector store from the documents.
const vectorStore = await MemoryVectorStore.fromDocuments(
  docs,
  new OpenAIEmbeddings()
);

// Initialize a retriever wrapper around the vector store
const vectorStoreRetriever = vectorStore.asRetriever();

// Create a system & human prompt for the chat model
const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}`;

const prompt = ChatPromptTemplate.fromMessages([
  ["system", SYSTEM_TEMPLATE],
  ["human", "{question}"],
]);

const chain = RunnableSequence.from([
  {
    context: vectorStoreRetriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  model,
  new StringOutputParser(),
]);

const answer = await chain.invoke(
  "What did the president say about Justice Breyer?"
);

console.log({ answer });

/*
  {
    answer: 'The president honored Justice Stephen Breyer by recognizing his dedication to serving the country as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. He thanked Justice Breyer for his service.'
  }
*/

API Reference:

OpenAIEmbeddings from @langchain/openai
ChatOpenAI from @langchain/openai
RecursiveCharacterTextSplitter from @langchain/textsplitters
MemoryVectorStore from langchain/vectorstores/memory
RunnablePassthrough from @langchain/core/runnables
RunnableSequence from @langchain/core/runnables
StringOutputParser from @langchain/core/output_parsers
ChatPromptTemplate from @langchain/core/prompts
Document from @langchain/core/documents

让我们逐步解释上面的代码。

首先，我们加载一段长文本，并使用文本分割器将其拆分为较小的文档。然后将这些文档（同时使用传入的 OpenAIEmbeddings 实例对文档进行嵌入）加载到 HNSWLib 中（我们的向量存储），从而创建索引。
虽然我们可以直接查询向量存储，但我们将其转换为检索器，以便返回的检索文档格式适合后续的问答链。
我们初始化一个检索链，这将在步骤4中调用。
我们提出问题！

下一步

现在您已经学会了如何将向量存储转换为检索器。

请参阅以下内容以深入了解：

特定检索器的详细说明
更全面的 RAG 教程
或者本节的如何为任意数据源创建自定义检索器

如何使用向量存储来检索数据

API Reference:

下一步

Was this page helpful?

You can also leave detailed feedback on GitHub.

如何使用向量存储来检索数据

API Reference:

下一步​

Was this page helpful?

You can also leave detailed feedback on GitHub.

下一步