构建一个基于SQL数据的问答系统

预备知识

本指南假设您熟悉以下概念：

使 LLM 系统能够查询结构化数据，与非结构化文本数据相比，可能会有质的不同。在后者中，通常会生成可以搜索向量数据库的文本；而对于结构化数据，LLM 通常会在 DSL（如 SQL）中编写和执行查询。在本指南中，我们将介绍在数据库表格数据上创建问答系统的基本方法。我们将涵盖使用链和代理的实现。这些系统将允许我们对数据库中的数据提出问题，并获得自然语言的答案。两者的主要区别在于，我们的代理可以循环多次查询数据库，以回答问题。

⚠️ 安全提示 ⚠️

构建 SQL 数据库的问答系统需要执行模型生成的 SQL 查询。这样做存在固有的风险。请务必始终将数据库连接权限的范围限制为链/代理需求所需的最小权限。这将缓解但无法完全消除构建模型驱动系统带来的风险。有关通用安全最佳实践的更多信息，请参见此处。

架构

从高层次来看，这些系统的步骤如下：

将问题转换为 SQL 查询：模型将用户输入转换为 SQL 查询。
执行 SQL 查询：执行该查询。
回答问题：模型使用查询结果回应用户输入。

环境准备

首先，安装所需的包并设置环境变量： bash npm2yarn npm i langchain @langchain/community @langchain/langgraph

# 如需使用LangSmith，请取消以下行的注释。这不是必需的，但推荐用于调试和可观察性。
# export LANGSMITH_API_KEY=<your key>
# export LANGSMITH_TRACING=true

# 如果您不在无服务器环境中，取消注释以下行以减少跟踪延迟
# export LANGCHAIN_CALLBACKS_BACKGROUND=true

示例数据

下面的示例将使用 SQLite 连接 Chinook 数据库，这是一个代表数字媒体商店的示例数据库。请按照这些安装步骤在与本笔记本相同的目录中创建 Chinook.db 文件。您还可以通过命令行下载并构建该数据库：

curl -s https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql | sqlite3 Chinook.db

现在，Chinook.db 已经在我们的目录中，我们可以使用 SqlDatabase 类与其进行交互：

import { SqlDatabase } from "langchain/sql_db";
import { DataSource } from "typeorm";

const datasource = new DataSource({
  type: "sqlite",
  database: "Chinook.db",
});
const db = await SqlDatabase.fromDataSourceParams({
  appDataSource: datasource,
});

await db.run("SELECT * FROM Artist LIMIT 10;");

[{"ArtistId":1,"Name":"AC/DC"},{"ArtistId":2,"Name":"Accept"},{"ArtistId":3,"Name":"Aerosmith"},{"ArtistId":4,"Name":"Alanis Morissette"},{"ArtistId":5,"Name":"Alice In Chains"},{"ArtistId":6,"Name":"Antônio Carlos Jobim"},{"ArtistId":7,"Name":"Apocalyptica"},{"ArtistId":8,"Name":"Audioslave"},{"ArtistId":9,"Name":"BackBeat"},{"ArtistId":10,"Name":"Billy Cobham"}]

很好！我们有一个可以查询的 SQL 数据库。现在让我们尝试将其连接到一个 LLM。

链式操作

链式操作是由可预测步骤组合而成的。在 LangGraph 中，我们可以通过简单的节点序列来表示一个链式操作。接下来，我们将创建一个步骤序列，针对提出的问题，完成以下操作： - 将问题转换为 SQL 查询； - 执行该查询； - 利用查询结果回答原始问题。

这种安排并不适用于所有场景。例如，该系统会对任何用户输入都执行 SQL 查询——即使是“你好”。重要的是，正如我们将在下面看到的，有些问题需要不止一次查询才能回答。我们将在“代理”部分中解决这些场景。

应用状态

LangGraph 的状态控制着输入到应用程序的数据、在步骤之间传递的数据以及由应用程序输出的数据。

对于本应用，我们只需跟踪输入的问题、生成的查询、查询结果以及生成的答案：

import { Annotation } from "@langchain/langgraph";

const InputStateAnnotation = Annotation.Root({
  question: Annotation<string>,
});

const StateAnnotation = Annotation.Root({
  question: Annotation<string>,
  query: Annotation<string>,
  result: Annotation<string>,
  answer: Annotation<string>,
});

现在我们只需要操作此状态并填充其内容的函数。

将问题转换为 SQL 查询

第一步是获取用户输入并将其转换为 SQL 查询。为了可靠地获取 SQL 查询（不包括 Markdown 格式、解释或澄清内容），我们将使用 LangChain 的结构化输出抽象功能。

让我们为应用程序选择一个聊天模型：

Pick your chat model:

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

Add environment variables

GROQ_API_KEY=your-api-key

Instantiate the model

import { ChatGroq } from "@langchain/groq";

const llm = new ChatGroq({
  model: "llama-3.3-70b-versatile",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

Add environment variables

OPENAI_API_KEY=your-api-key

Instantiate the model

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

Add environment variables

ANTHROPIC_API_KEY=your-api-key

Instantiate the model

import { ChatAnthropic } from "@langchain/anthropic";

const llm = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/google-genai

yarn add @langchain/google-genai 

pnpm add @langchain/google-genai 

Add environment variables

GOOGLE_API_KEY=your-api-key

Instantiate the model

import { ChatGoogleGenerativeAI } from "@langchain/google-genai";

const llm = new ChatGoogleGenerativeAI({
  model: "gemini-2.0-flash",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community 

pnpm add @langchain/community 

Add environment variables

FIREWORKS_API_KEY=your-api-key

Instantiate the model

import { ChatFireworks } from "@langchain/community/chat_models/fireworks";

const llm = new ChatFireworks({
  model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

Add environment variables

MISTRAL_API_KEY=your-api-key

Instantiate the model

import { ChatMistralAI } from "@langchain/mistralai";

const llm = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

Install dependencies

tip

See this section for general instructions on installing integration packages.

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

Add environment variables

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

Instantiate the model

import { ChatVertexAI } from "@langchain/google-vertexai";

const llm = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});

我们将从 Prompt Hub 中提取一个提示，以指导模型。

import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const queryPromptTemplate = await pull<ChatPromptTemplate>(
  "langchain-ai/sql-query-system-prompt"
);

queryPromptTemplate.promptMessages.forEach((message) => {
  console.log(message.lc_kwargs.prompt.template);
});

Given an input question, create a syntactically correct {dialect} query to run to help find the answer. Unless the user specifies in his question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.

Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.

Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Only use the following tables:
{table_info}
Question: {input}

提示中包含多个我们需要填充的参数，例如 SQL 方言和表结构。LangChain 的SqlDatabase对象包含了一些方法来帮助完成此操作。我们的writeQuery步骤将仅填充这些参数，并提示模型生成 SQL 查询：

import { z } from "zod";

const queryOutput = z.object({
  query: z.string().describe("Syntactically valid SQL query."),
});

const structuredLlm = llm.withStructuredOutput(queryOutput);

const writeQuery = async (state: typeof InputStateAnnotation.State) => {
  const promptValue = await queryPromptTemplate.invoke({
    dialect: db.appDataSourceOptions.type,
    top_k: 10,
    table_info: await db.getTableInfo(),
    input: state.question,
  });
  const result = await structuredLlm.invoke(promptValue);
  return { query: result.query };
};

让我们来测试一下：

await writeQuery({ question: "How many Employees are there?" });

{ query: 'SELECT COUNT(*) AS EmployeeCount FROM Employee;' }

执行查询

这是创建 SQL 链中最危险的部分。 请仔细考虑是否可以对您的数据运行自动化查询。尽可能地降低数据库连接的权限。在执行查询前，考虑在链中添加一个人工审批步骤（见下文）。

要执行查询，我们将从langchain-community加载一个工具。我们的executeQuery节点将仅封装这个工具：

import { QuerySqlTool } from "langchain/tools/sql";

const executeQuery = async (state: typeof StateAnnotation.State) => {
  const executeQueryTool = new QuerySqlTool(db);
  return { result: await executeQueryTool.invoke(state.query) };
};

测试此步骤：

await executeQuery({
  question: "",
  query: "SELECT COUNT(*) AS EmployeeCount FROM Employee;",
  result: "",
  answer: "",
});

{ result: '[{"EmployeeCount":8}]' }

生成答案

最后一步，我们将根据从数据库中提取的信息生成对问题的答案：

const generateAnswer = async (state: typeof StateAnnotation.State) => {
  const promptValue =
    "Given the following user question, corresponding SQL query, " +
    "and SQL result, answer the user question.\n\n" +
    `Question: ${state.question}\n` +
    `SQL Query: ${state.query}\n` +
    `SQL Result: ${state.result}\n`;
  const response = await llm.invoke(promptValue);
  return { answer: response.content };
};

使用 LangGraph 编排

最后，我们将应用程序编译成一个单独的 graph 对象。在这种情况下，我们只是将三个步骤连接成一个单一的序列。

import { StateGraph } from "@langchain/langgraph";

const graphBuilder = new StateGraph({
  stateSchema: StateAnnotation,
})
  .addNode("writeQuery", writeQuery)
  .addNode("executeQuery", executeQuery)
  .addNode("generateAnswer", generateAnswer)
  .addEdge("__start__", "writeQuery")
  .addEdge("writeQuery", "executeQuery")
  .addEdge("executeQuery", "generateAnswer")
  .addEdge("generateAnswer", "__end__");

const graph = graphBuilder.compile();

LangGraph 还提供内置工具，用于可视化应用程序的控制流程：

// 注意：tslab 只能在 Jupyter Notebook 内部运行。请勿担心自己运行此代码！
import * as tslab from "tslab";

const image = await graph.getGraph().drawMermaidPng();
const arrayBuffer = await image.arrayBuffer();

await tslab.display.png(new Uint8Array(arrayBuffer));

graph_img_sql_qa

让我们测试我们的应用程序！请注意，我们可以流式传输各个步骤的结果：

let inputs = { question: "How many employees are there?" };

console.log(inputs);
console.log("\n====\n");
for await (const step of await graph.stream(inputs, {
  streamMode: "updates",
})) {
  console.log(step);
  console.log("\n====\n");
}

{ question: 'How many employees are there?' }

====

{
  writeQuery: { query: 'SELECT COUNT(*) AS EmployeeCount FROM Employee;' }
}

====

{ executeQuery: { result: '[{"EmployeeCount":8}]' } }

====

{ generateAnswer: { answer: 'There are 8 employees.' } }

====

查看 LangSmith 跟踪。

人工参与（Human-in-the-loop）

LangGraph 支持许多对这一工作流程有用的特性。其中之一是人工参与（human-in-the-loop）：我们可以在敏感步骤（例如执行 SQL 查询）之前中断应用程序，以进行人工审核。这一功能由 LangGraph 的持久化（persistence）层支持，它可以将运行进度保存到你选择的存储中。以下代码指定了内存中的存储方式：

import { MemorySaver } from "@langchain/langgraph";

const checkpointer = new MemorySaver();
const graphWithInterrupt = graphBuilder.compile({
  checkpointer: checkpointer,
  interruptBefore: ["executeQuery"],
});

// Now that we're using persistence, we need to specify a thread ID
// so that we can continue the run after review.
const threadConfig = {
  configurable: { thread_id: "1" },
  streamMode: "updates" as const,
};

const image = await graphWithInterrupt.getGraph().drawMermaidPng();
const arrayBuffer = await image.arrayBuffer();

await tslab.display.png(new Uint8Array(arrayBuffer));

graph_img_sql_qa_interrupt

让我们重复相同的运行，添加一个简单的“是”/“否”审批步骤：

console.log(inputs);
console.log("\n====\n");
for await (const step of await graphWithInterrupt.stream(
  inputs,
  threadConfig
)) {
  console.log(step);
  console.log("\n====\n");
}

// Will log when the graph is interrupted, after `executeQuery`.
console.log("---GRAPH INTERRUPTED---");

{ question: 'How many employees are there?' }

====

{
  writeQuery: { query: 'SELECT COUNT(*) AS EmployeeCount FROM Employee;' }
}

====

---GRAPH INTERRUPTED---

运行在查询执行之前中断。此时，我们的应用程序可以处理与用户的交互，由用户审阅查询。

如果获得批准，使用 null 输入运行相同的线程将从我们中断的位置继续执行。这是通过 LangGraph 的持久化层实现的。

for await (const step of await graphWithInterrupt.stream(null, threadConfig)) {
  console.log(step);
  console.log("\n====\n");
}

{ executeQuery: { result: '[{"EmployeeCount":8}]' } }

====

{ generateAnswer: { answer: 'There are 8 employees.' } }

====

详见此指南以了解更多细节和示例。

下一步

对于更复杂的查询生成，我们可能需要创建少量样本提示或添加查询检查步骤。如需了解此类高级技术，请参阅：

提示策略：高级提示工程技巧。
查询检查：添加查询验证和错误处理。
大型数据库：处理大型数据库的技术。

代理

代理利用 LLM 的推理能力在执行过程中进行决策。使用代理可以将查询生成和执行过程中的更多权限进行下放。尽管它们的行为比上述“链”更难预测，但它们也具有一些优势：

它们可以根据需要多次查询数据库以回答用户问题。
它们可以通过运行生成的查询、捕获错误跟踪并正确重新生成来从错误中恢复。
它们可以根据数据库的模式以及数据库的内容回答问题（例如描述特定表）。

下面我们将构建一个最简化的 SQL 代理。我们将使用 LangChain 的 SqlToolkit 为其配备一组工具。通过 LangGraph 的预构建的 ReAct 代理构造器，我们可以在一行代码中完成此操作。

SqlToolkit 包含可以以下功能的工具：

创建和执行查询
检查查询语法
获取表描述
… 以及更多

import { SqlToolkit } from "langchain/agents/toolkits/sql";

const toolkit = new SqlToolkit(db, llm);

const tools = toolkit.getTools();

console.log(
  tools.map((tool) => ({
    name: tool.name,
    description: tool.description,
  }))
);

[
  {
    name: 'query-sql',
    description: 'Input to this tool is a detailed and correct SQL query, output is a result from the database.\n' +
      '  If the query is not correct, an error message will be returned.\n' +
      '  If an error is returned, rewrite the query, check the query, and try again.'
  },
  {
    name: 'info-sql',
    description: 'Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.\n' +
      '    Be sure that the tables actually exist by calling list-tables-sql first!\n' +
      '\n' +
      '    Example Input: "table1, table2, table3.'
  },
  {
    name: 'list-tables-sql',
    description: 'Input is an empty string, output is a comma-separated list of tables in the database.'
  },
  {
    name: 'query-checker',
    description: 'Use this tool to double check if your query is correct before executing it.\n' +
      '    Always use this tool before executing a query with query-sql!'
  }
]

系统提示

我们还需要为代理加载一个系统提示。这将包含有关如何行为的指令。

import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const systemPromptTemplate = await pull<ChatPromptTemplate>(
  "langchain-ai/sql-agent-system-prompt"
);

console.log(systemPromptTemplate.promptMessages[0].lc_kwargs.prompt.template);

You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the below tools. Only use the information returned by the below tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.

To start you should ALWAYS look at the tables in the database to see what you can query.
Do NOT skip this step.
Then you should query the schema of the most relevant tables.

让我们填写提示中突出显示的参数：

const systemMessage = await systemPromptTemplate.format({
  dialect: "SQLite",
  top_k: 5,
});

初始化智能体

我们将使用预构建的 LangGraph 智能体来构建我们的智能体

import { createReactAgent } from "@langchain/langgraph/prebuilt";

const agent = createReactAgent({
  llm: llm,
  tools: tools,
  prompt: systemMessage,
});

考虑智能体如何回应以下问题：

展开以查看 `prettyPrint` 代码。

import { AIMessage, BaseMessage, isAIMessage } from "@langchain/core/messages";

const prettyPrint = (message: BaseMessage) => {
  let txt = `[${message._getType()}]: ${message.content}`;
  if ((isAIMessage(message) && message.tool_calls?.length) || 0 > 0) {
    const tool_calls = (message as AIMessage)?.tool_calls
      ?.map((tc) => `- ${tc.name}(${JSON.stringify(tc.args)})`)
      .join("\n");
    txt += ` \nTools: \n${tool_calls}`;
  }
  console.log(txt);
};

let inputs2 = {
  messages: [
    { role: "user", content: "Which country's customers spent the most?" },
  ],
};

for await (const step of await agent.stream(inputs2, {
  streamMode: "values",
})) {
  const lastMessage = step.messages[step.messages.length - 1];
  prettyPrint(lastMessage);
  console.log("-----\n");
}

[human]: Which country's customers spent the most?
-----

[ai]:
Tools:
- list-tables-sql({"input":""})
-----

[tool]: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
-----

[ai]:
Tools:
- info-sql({"input":"Customer, Invoice, InvoiceLine"})
- info-sql({"input":"Invoice"})
-----

[tool]: CREATE TABLE Invoice (
InvoiceId INTEGER NOT NULL, CustomerId INTEGER NOT NULL, InvoiceDate DATETIME NOT NULL, BillingAddress NVARCHAR(70) , BillingCity NVARCHAR(40) , BillingState NVARCHAR(40) , BillingCountry NVARCHAR(40) , BillingPostalCode NVARCHAR(10) , Total NUMERIC(10,2) NOT NULL)
SELECT * FROM "Invoice" LIMIT 3;
 InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
 1 2 2021-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart null Germany 70174 1.98
 2 4 2021-01-02 00:00:00 Ullevålsveien 14 Oslo null Norway 0171 3.96
 3 8 2021-01-03 00:00:00 Grétrystraat 63 Brussels null Belgium 1000 5.94

-----

[ai]:
Tools:
- query-checker({"input":"SELECT c.Country, SUM(i.Total) AS TotalSpent \nFROM Customer c \nJOIN Invoice i ON c.CustomerId = i.CustomerId \nGROUP BY c.Country \nORDER BY TotalSpent DESC \nLIMIT 5;"})
-----

[tool]: The SQL query you provided appears to be correct and does not contain any of the common mistakes listed. It properly joins the `Customer` and `Invoice` tables, groups the results by country, and orders the total spending in descending order while limiting the results to the top 5 countries.

Here is the original query reproduced:

```sql
SELECT c.Country, SUM(i.Total) AS TotalSpent
FROM Customer c
JOIN Invoice i ON c.CustomerId = i.CustomerId
GROUP BY c.Country
ORDER BY TotalSpent DESC
LIMIT 5;
```

No changes are necessary.
-----

[ai]:
Tools:
- query-sql({"input":"SELECT c.Country, SUM(i.Total) AS TotalSpent \nFROM Customer c \nJOIN Invoice i ON c.CustomerId = i.CustomerId \nGROUP BY c.Country \nORDER BY TotalSpent DESC \nLIMIT 5;"})
-----

[tool]: [{"Country":"USA","TotalSpent":523.0600000000003},{"Country":"Canada","TotalSpent":303.9599999999999},{"Country":"France","TotalSpent":195.09999999999994},{"Country":"Brazil","TotalSpent":190.09999999999997},{"Country":"Germany","TotalSpent":156.48}]
-----

[ai]: The countries whose customers spent the most are:

1. **USA** - $523.06
2. **Canada** - $303.96
3. **France** - $195.10
4. **Brazil** - $190.10
5. **Germany** - $156.48
-----

你还可以使用 LangSmith 追踪来可视化这些步骤及相关元数据。

请注意，智能体将执行多个查询，直到获得所需的信息： 1. 列出可用的表格； 2. 获取三个表格的模式； 3. 通过连接操作查询多个表格。

然后，智能体能够使用最终查询的结果来生成对原始问题的答案。

该智能体同样可以处理定性问题：

let inputs3 = {
  messages: [{ role: "user", content: "Describe the playlisttrack table" }],
};

for await (const step of await agent.stream(inputs3, {
  streamMode: "values",
})) {
  const lastMessage = step.messages[step.messages.length - 1];
  prettyPrint(lastMessage);
  console.log("-----\n");
}

[human]: Describe the playlisttrack table
-----

[ai]:
Tools:
- list-tables-sql({"input":""})
-----

[tool]: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
-----

[ai]:
Tools:
- info-sql({"input":"PlaylistTrack"})
-----

[tool]: CREATE TABLE PlaylistTrack (
PlaylistId INTEGER NOT NULL, TrackId INTEGER NOT NULL)
SELECT * FROM "PlaylistTrack" LIMIT 3;
 PlaylistId TrackId
 1 3402
 1 3389
 1 3390

-----

[ai]: The `PlaylistTrack` table has the following schema:

- **PlaylistId**: INTEGER (NOT NULL)
- **TrackId**: INTEGER (NOT NULL)

This table is used to associate tracks with playlists. Here are some sample rows from the table:

| PlaylistId | TrackId |
|------------|---------|
| 1          | 3402    |
| 1          | 3389    |
| 1          | 3390    |
-----

处理高基数列

为了过滤包含专有名词（如地址、歌曲名称或艺术家）的列，我们首先需要检查拼写，以便正确过滤数据。

我们可以通过创建一个包含数据库中所有不同专有名词的向量存储来实现这一点。然后，每当用户在问题中使用专有名词时，可以让代理查询该向量存储，以找到该单词的正确拼写。通过这种方式，代理可以在构建目标查询之前确保自己理解用户所指的是哪一个实体。

首先，我们需要获取每个我们想要的实体的唯一值，为此我们定义一个函数，将结果解析为元素列表：

async function queryAsList(
  database: SqlDatabase,
  query: string
): Promise<string[]> {
  const res: Array<{ [key: string]: string }> = JSON.parse(
    await database.run(query)
  )
    .flat()
    .filter((el: { [key: string]: string } | null) => el != null);
  const justValues: Array<string> = res.map((item) =>
    Object.values(item)[0]
      .replace(/\b\d+\b/g, "")
      .trim()
  );
  return justValues;
}

// Gather entities into a list
let artists: string[] = await queryAsList(db, "SELECT Name FROM Artist");
let albums: string[] = await queryAsList(db, "SELECT Title FROM Album");
let properNouns = artists.concat(albums);

console.log(`Total: ${properNouns.length}\n`);
console.log(`Sample: ${properNouns.slice(0, 5)}...`);

Total: 622

Sample: AC/DC,Accept,Aerosmith,Alanis Morissette,Alice In Chains...

使用这个函数，我们可以创建一个检索工具，代理可以根据需要执行该工具。

让我们为这一步选择一个嵌入模型和向量存储：

Pick your embedding model:

Install dependencies

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai

pnpm add @langchain/openai

OPENAI_API_KEY=your-api-key

import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-large"
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai

pnpm add @langchain/openai

AZURE_OPENAI_API_INSTANCE_NAME=<YOUR_INSTANCE_NAME>
AZURE_OPENAI_API_KEY=<YOUR_KEY>
AZURE_OPENAI_API_VERSION="2024-02-01"

import { AzureOpenAIEmbeddings } from "@langchain/openai";

const embeddings = new AzureOpenAIEmbeddings({
  azureOpenAIApiEmbeddingsDeploymentName: "text-embedding-ada-002"
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/aws

yarn add @langchain/aws

pnpm add @langchain/aws

BEDROCK_AWS_REGION=your-region

import { BedrockEmbeddings } from "@langchain/aws";

const embeddings = new BedrockEmbeddings({
  model: "amazon.titan-embed-text-v1"
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai

pnpm add @langchain/google-vertexai

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

import { VertexAIEmbeddings } from "@langchain/google-vertexai";

const embeddings = new VertexAIEmbeddings({
  model: "text-embedding-004"
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai

pnpm add @langchain/mistralai

MISTRAL_API_KEY=your-api-key

import { MistralAIEmbeddings } from "@langchain/mistralai";

const embeddings = new MistralAIEmbeddings({
  model: "mistral-embed"
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/cohere

yarn add @langchain/cohere

pnpm add @langchain/cohere

COHERE_API_KEY=your-api-key

import { CohereEmbeddings } from "@langchain/cohere";

const embeddings = new CohereEmbeddings({
  model: "embed-english-v3.0"
});

Pick your vector store:

Install dependencies

npm
yarn
pnpm

npm i langchain

yarn add langchain

pnpm add langchain

import { MemoryVectorStore } from "langchain/vectorstores/memory";

const vectorStore = new MemoryVectorStore(embeddings);

Install dependencies

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community

pnpm add @langchain/community

import { Chroma } from "@langchain/community/vectorstores/chroma";

const vectorStore = new Chroma(embeddings, {
  collectionName: "a-test-collection",
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community

pnpm add @langchain/community

import { FaissStore } from "@langchain/community/vectorstores/faiss";

const vectorStore = new FaissStore(embeddings, {});

Install dependencies

npm
yarn
pnpm

npm i @langchain/mongodb

yarn add @langchain/mongodb

pnpm add @langchain/mongodb

import { MongoDBAtlasVectorSearch } from "@langchain/mongodb"
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const collection = client
  .db(process.env.MONGODB_ATLAS_DB_NAME)
  .collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);

const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
  collection: collection,
  indexName: "vector_index",
  textKey: "text",
  embeddingKey: "embedding",
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community

pnpm add @langchain/community

import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";

const vectorStore = await PGVectorStore.initialize(embeddings, {})

Install dependencies

npm
yarn
pnpm

npm i @langchain/pinecone

yarn add @langchain/pinecone

pnpm add @langchain/pinecone

import { PineconeStore } from "@langchain/pinecone";
import { Pinecone as PineconeClient } from "@pinecone-database/pinecone";

const pinecone = new PineconeClient();
const vectorStore = new PineconeStore(embeddings, {
  pineconeIndex,
  maxConcurrency: 5,
});

Install dependencies

npm
yarn
pnpm

npm i @langchain/qdrant

yarn add @langchain/qdrant

pnpm add @langchain/qdrant

import { QdrantVectorStore } from "@langchain/qdrant";

const vectorStore = await QdrantVectorStore.fromExistingCollection(embeddings, {
  url: process.env.QDRANT_URL,
  collectionName: "langchainjs-testing",
});

我们现在可以构建一个检索工具，用于在数据库中检索相关的专有名词：

import { createRetrieverTool } from "langchain/tools/retriever";
import { Document } from "@langchain/core/documents";

const documents = properNouns.map(
  (text) => new Document({ pageContent: text })
);
await vectorStore.addDocuments(documents);

const retriever = vectorStore.asRetriever(5);

const retrieverTool = createRetrieverTool(retriever, {
  name: "searchProperNouns",
  description:
    "Use to look up values to filter on. Input is an approximate spelling " +
    "of the proper noun, output is valid proper nouns. Use the noun most " +
    "similar to the search.",
});

让我们试一下：

console.log(await retrieverTool.invoke({ query: "Alice Chains" }));

Alice In Chains

Alanis Morissette

Jagged Little Pill

Angel Dust

Amy Winehouse

这样，如果智能体确定需要根据类似“Alice Chains”的艺术家编写一个过滤器，它可以首先使用检索器工具来查看某一列的相关值。

将这些整合在一起：

// Add to system message
let suffix =
  "If you need to filter on a proper noun like a Name, you must ALWAYS first look up " +
  "the filter value using the 'search_proper_nouns' tool! Do not try to " +
  "guess at the proper name - use this function to find similar ones.";

const system = systemMessage + suffix;

const updatedTools = tools.concat(retrieverTool);

const agent2 = createReactAgent({
  llm: llm,
  tools: updatedTools,
  prompt: system,
});

let inputs4 = {
  messages: [
    { role: "user", content: "How many albums does alis in chain have?" },
  ],
};

for await (const step of await agent2.stream(inputs4, {
  streamMode: "values",
})) {
  const lastMessage = step.messages[step.messages.length - 1];
  prettyPrint(lastMessage);
  console.log("-----\n");
}

[human]: How many albums does alis in chain have?
-----

[ai]:
Tools:
- searchProperNouns({"query":"alis in chain"})
-----

[tool]: Alice In Chains

Alanis Morissette

Up An' Atom

A-Sides

Jagged Little Pill
-----

[ai]:
Tools:
- query-sql({"input":"SELECT COUNT(*) FROM albums WHERE artist_name = 'Alice In Chains'"})
-----

[tool]: QueryFailedError: SQLITE_ERROR: no such table: albums
-----

[ai]:
Tools:
- list-tables-sql({"input":""})
-----

[tool]: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
-----

[ai]:
Tools:
- info-sql({"input":"Album"})
- info-sql({"input":"Artist"})
-----

[tool]: CREATE TABLE Artist (
ArtistId INTEGER NOT NULL, Name NVARCHAR(120) )
SELECT * FROM "Artist" LIMIT 3;
 ArtistId Name
 1 AC/DC
 2 Accept
 3 Aerosmith

-----

[ai]:
Tools:
- query-sql({"input":"SELECT COUNT(*) FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'Alice In Chains')"})
-----

[tool]: [{"COUNT(*)":1}]
-----

[ai]: Alice In Chains has released 1 album.
-----

正如我们所见，无论是在流式步骤中还是在LangSmith 追踪中，智能代理都使用了 searchProperNouns 工具，以确认如何针对该特定艺术家正确地向数据库发起查询。

⚠️ 安全提示 ⚠️​

架构​

环境准备​

示例数据​

链式操作​

应用状态​

将问题转换为 SQL 查询​

Pick your chat model:

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

Install dependencies

Add environment variables

Instantiate the model

执行查询​

生成答案​

使用 LangGraph 编排​

人工参与（Human-in-the-loop）​

下一步​

代理​

系统提示​

初始化智能体​

处理高基数列​

Pick your embedding model:

Install dependencies

Install dependencies

Install dependencies

Install dependencies

Install dependencies

Install dependencies

Pick your vector store:

Install dependencies

Install dependencies

Install dependencies

Install dependencies

Install dependencies

Install dependencies

Install dependencies

Was this page helpful?

You can also leave detailed feedback on GitHub.

⚠️ 安全提示 ⚠️

架构

环境准备

示例数据

链式操作

应用状态

将问题转换为 SQL 查询

执行查询

生成答案

使用 LangGraph 编排

人工参与（Human-in-the-loop）

下一步

代理

系统提示

初始化智能体

处理高基数列