Azure AI 搜索
Azure AI Search(以前称为 Azure Search 和 Azure Cognitive Search)是一个分布式、RESTful 的搜索引擎,优化了在 Azure 上的生产级工作负载下的速度和相关性。它还支持使用 k-近邻 (kNN) 算法的向量搜索,以及 语义搜索。
此向量存储集成支持全文搜索、向量搜索以及混合搜索以获得最佳排序性能。
您可以从 此页面 了解如何利用 Azure AI Search 的向量搜索功能。如果您还没有 Azure 账户,可以 创建一个免费账户 来开始使用。
安装配置
您首先需要安装 @azure/search-documents SDK 和 @langchain/community 包:
:::提示 请参阅安装集成包的一般说明部分。 :::
- npm
- Yarn
- pnpm
npm install -S @langchain/community @langchain/core @azure/search-documents
yarn add @langchain/community @langchain/core @azure/search-documents
pnpm add @langchain/community @langchain/core @azure/search-documents
您还需要运行一个 Azure AI Search 实例。您可以按照 本指南 在 Azure 门户中免费部署一个实例。
当您的实例运行后,请确保您已获取终结点和管理员密钥(查询密钥只能用于搜索文档,不能用于索引、更新或删除)。终结点是您在 Azure 门户中实例的 "概览" 部分下找到的 URL。管理员密钥可以在实例的 "密钥" 部分下找到。然后您需要设置以下环境变量:
# Azure AI Search connection settings
AZURE_AISEARCH_ENDPOINT=
AZURE_AISEARCH_KEY=
# If you're using Azure OpenAI API, you'll need to set these variables
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_INSTANCE_NAME=
AZURE_OPENAI_API_DEPLOYMENT_NAME=
AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME=
AZURE_OPENAI_API_VERSION=
# Or you can use the OpenAI API directly
OPENAI_API_KEY=
API Reference:
关于混合搜索
混合搜索是一种结合了全文搜索和向量搜索优势的功能,以提供最佳的排序性能。该功能在 Azure AI Search 向量存储中默认启用,但您可以通过在创建向量存储时设置 search.type 属性来选择不同的搜索查询类型。
您可以在 官方文档 中了解更多关于混合搜索及其如何提升搜索结果的信息。
在某些场景下,例如检索增强生成 (RAG),您可能希望在启用混合搜索的同时启用 语义排序 以提升搜索结果的相关性。您可以通过在创建向量存储时将 search.type 属性设置为 AzureAISearchQueryType.SemanticHybrid 来启用语义排序。
请注意,语义排序功能仅在基本版及更高级的定价层级中可用,并且受区域可用性 限制。
您可以在 这篇博客文章 中了解更多关于使用语义排序结合混合搜索的性能表现。
示例:索引文档、向量搜索与 LLM 集成
以下是一个示例,展示了如何从文件中将文档索引到 Azure AI Search 中,执行混合搜索查询,并最终使用链(chain)基于检索到的文档回答自然语言问题。
import {
AzureAISearchVectorStore,
AzureAISearchQueryType,
} from "@langchain/community/vectorstores/azure_aisearch";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);
// Create Azure AI Search vector store
const store = await AzureAISearchVectorStore.fromDocuments(
documents,
new OpenAIEmbeddings(),
{
search: {
type: AzureAISearchQueryType.SimilarityHybrid,
},
}
);
// The first time you run this, the index will be created.
// You may need to wait a bit for the index to be created before you can perform
// a search, or you can create the index manually beforehand.
// Performs a similarity search
const resultDocuments = await store.similaritySearch(
"What did the president say about Ketanji Brown Jackson?"
);
console.log("Similarity search results:");
console.log(resultDocuments[0].pageContent);
/*
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
*/
// Use the store as part of a chain
const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
[
"system",
"Answer the user's questions based on the below context:\n\n{context}",
],
["human", "{input}"],
]);
const combineDocsChain = await createStuffDocumentsChain({
llm: model,
prompt: questionAnsweringPrompt,
});
const chain = await createRetrievalChain({
retriever: store.asRetriever(),
combineDocsChain,
});
const response = await chain.invoke({
input: "What is the president's top priority regarding prices?",
});
console.log("Chain response:");
console.log(response.answer);
/*
The president's top priority is getting prices under control.
*/
API Reference:
- AzureAISearchVectorStore from
@langchain/community/vectorstores/azure_aisearch - AzureAISearchQueryType from
@langchain/community/vectorstores/azure_aisearch - ChatPromptTemplate from
@langchain/core/prompts - ChatOpenAI from
@langchain/openai - OpenAIEmbeddings from
@langchain/openai - createStuffDocumentsChain from
langchain/chains/combine_documents - createRetrievalChain from
langchain/chains/retrieval - TextLoader from
langchain/document_loaders/fs/text - RecursiveCharacterTextSplitter from
@langchain/textsplitters
相关内容
Related
- Vector store conceptual guide
- Vector store how-to guides