Xata
Xata 是一个基于 PostgreSQL 的无服务器数据平台。它提供了类型安全的 TypeScript/JavaScript SDK 用于与数据库交互,并提供了用于管理数据的 UI。
Xata 拥有一个原生的向量类型,可以添加到任意表中,并支持相似性搜索。LangChain 可以直接将向量插入 Xata,并对其执行最近邻查询,这样你就可以在 Xata 上使用所有 LangChain 嵌入(Embeddings)的集成。
配置
安装 Xata CLI
npm install @xata.io/cli -g
创建一个用作向量存储的数据库
在 Xata UI 中创建一个新的数据库。你可以随意命名,但在这个示例中我们将使用 langchain。
创建一个表,同样可以随意命名,但我们将使用 vectors。通过 UI 添加以下列:
content类型为 "Text"。用于存储Document.pageContent的值。embedding类型为 "Vector"。使用你计划使用的模型的维度(例如 OpenAI 使用 1536)。- 其他你想用作元数据的列。这些列的值将从
Document.metadata对象中获取。例如,如果在Document.metadata对象中存在title属性,则可以在表中创建一个title列,它将自动填充对应值。
初始化项目
在你的项目中运行:
xata init
然后选择你之前创建的数据库。这一步会生成一个 xata.ts 或 xata.js 文件,用于定义与数据库交互的客户端。更多关于使用 Xata JavaScript/TypeScript SDK 的细节,请参考 Xata 入门文档。
使用
:::提示 请参阅安装集成包的一般说明部分。 :::
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/community @langchain/core
yarn add @langchain/openai @langchain/community @langchain/core
pnpm add @langchain/openai @langchain/community @langchain/core
示例:使用 OpenAI 和 Xata 作为向量存储的问答聊天机器人
此示例使用 VectorDBQAChain 在 Xata 中存储的文档中进行搜索,然后将这些文档作为上下文传递给 OpenAI 模型,以回答用户提出的问题。
import { XataVectorSearch } from "@langchain/community/vectorstores/xata";
import { OpenAIEmbeddings, OpenAI } from "@langchain/openai";
import { BaseClient } from "@xata.io/client";
import { VectorDBQAChain } from "langchain/chains";
import { Document } from "@langchain/core/documents";
// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/xata
// if you use the generated client, you don't need this function.
// Just import getXataClient from the generated xata.ts instead.
const getXataClient = () => {
if (!process.env.XATA_API_KEY) {
throw new Error("XATA_API_KEY not set");
}
if (!process.env.XATA_DB_URL) {
throw new Error("XATA_DB_URL not set");
}
const xata = new BaseClient({
databaseURL: process.env.XATA_DB_URL,
apiKey: process.env.XATA_API_KEY,
branch: process.env.XATA_BRANCH || "main",
});
return xata;
};
export async function run() {
const client = getXataClient();
const table = "vectors";
const embeddings = new OpenAIEmbeddings();
const store = new XataVectorSearch(embeddings, { client, table });
// Add documents
const docs = [
new Document({
pageContent: "Xata is a Serverless Data platform based on PostgreSQL",
}),
new Document({
pageContent:
"Xata offers a built-in vector type that can be used to store and query vectors",
}),
new Document({
pageContent: "Xata includes similarity search",
}),
];
const ids = await store.addDocuments(docs);
// eslint-disable-next-line no-promise-executor-return
await new Promise((r) => setTimeout(r, 2000));
const model = new OpenAI();
const chain = VectorDBQAChain.fromLLM(model, store, {
k: 1,
returnSourceDocuments: true,
});
const response = await chain.invoke({ query: "What is Xata?" });
console.log(JSON.stringify(response, null, 2));
await store.delete({ ids });
}
API Reference:
- XataVectorSearch from
@langchain/community/vectorstores/xata - OpenAIEmbeddings from
@langchain/openai - OpenAI from
@langchain/openai - VectorDBQAChain from
langchain/chains - Document from
@langchain/core/documents
示例:带元数据过滤的相似性搜索
此示例展示了如何使用 LangChain.js 和 Xata 实现语义搜索。在运行前,请确保在 Xata 的 vectors 表中添加了一个类型为 String 的 author 列。
import { XataVectorSearch } from "@langchain/community/vectorstores/xata";
import { OpenAIEmbeddings } from "@langchain/openai";
import { BaseClient } from "@xata.io/client";
import { Document } from "@langchain/core/documents";
// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/xata
// Also, add a column named "author" to the "vectors" table.
// if you use the generated client, you don't need this function.
// Just import getXataClient from the generated xata.ts instead.
const getXataClient = () => {
if (!process.env.XATA_API_KEY) {
throw new Error("XATA_API_KEY not set");
}
if (!process.env.XATA_DB_URL) {
throw new Error("XATA_DB_URL not set");
}
const xata = new BaseClient({
databaseURL: process.env.XATA_DB_URL,
apiKey: process.env.XATA_API_KEY,
branch: process.env.XATA_BRANCH || "main",
});
return xata;
};
export async function run() {
const client = getXataClient();
const table = "vectors";
const embeddings = new OpenAIEmbeddings();
const store = new XataVectorSearch(embeddings, { client, table });
// Add documents
const docs = [
new Document({
pageContent: "Xata works great with Langchain.js",
metadata: { author: "Xata" },
}),
new Document({
pageContent: "Xata works great with Langchain",
metadata: { author: "Langchain" },
}),
new Document({
pageContent: "Xata includes similarity search",
metadata: { author: "Xata" },
}),
];
const ids = await store.addDocuments(docs);
// eslint-disable-next-line no-promise-executor-return
await new Promise((r) => setTimeout(r, 2000));
// author is applied as pre-filter to the similarity search
const results = await store.similaritySearchWithScore("xata works great", 6, {
author: "Langchain",
});
console.log(JSON.stringify(results, null, 2));
await store.delete({ ids });
}
API Reference:
- XataVectorSearch from
@langchain/community/vectorstores/xata - OpenAIEmbeddings from
@langchain/openai - Document from
@langchain/core/documents
相关内容
Related
- Vector store conceptual guide
- Vector store how-to guides