如何构建知识图谱
在本指南中,我们将介绍基于非结构化文本构建知识图谱的基本方法。构建完成的图谱可作为知识库用于 RAG 应用。从高层次来看,从文本构建知识图谱的步骤包括:
- 从文本中提取结构化信息:使用模型从文本中提取结构化的图信息。
- 存储到图数据库:将提取出的结构化图信息存储到图数据库中,以支持下游的 RAG 应用
配置
安装依赖
:::提示 请参阅安装集成包的一般说明部分。 :::
- npm
- yarn
- pnpm
npm i langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod
yarn add langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod
pnpm add langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod
设置环境变量
本示例中我们将使用 OpenAI:
OPENAI_API_KEY=your-api-key
# 可选,使用 LangSmith 以获得最佳的可观测性
LANGSMITH_API_KEY=your-api-key
LANGSMITH_TRACING=true
# 如果您不在无服务器环境中,请减少追踪延迟
# LANGCHAIN_CALLBACKS_BACKGROUND=true
接下来,我们需要定义 Neo4j 凭据。 请按照 这些安装步骤 来设置 Neo4j 数据库。
NEO4J_URI="bolt://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"
以下示例将创建与 Neo4j 数据库的连接。
import "neo4j-driver";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
const url = process.env.NEO4J_URI;
const username = process.env.NEO4J_USER;
const password = process.env.NEO4J_PASSWORD;
const graph = await Neo4jGraph.initialize({ url, username, password });
LLM 图转换器
从文本中提取图数据能够将非结构化信息转换为结构化格式,便于深入分析并更高效地处理复杂的关系和模式。LLMGraphTransformer 利用大语言模型(LLM)解析和分类实体及其关系,将文本文档转换为结构化图文档。LLM 模型的选择会显著影响输出结果,决定所提取图数据的准确性与细致程度。
import { ChatOpenAI } from "@langchain/openai";
import { LLMGraphTransformer } from "@langchain/community/experimental/graph_transformers/llm";
const model = new ChatOpenAI({
temperature: 0,
model: "gpt-4o-mini",
});
const llmGraphTransformer = new LLMGraphTransformer({
llm: model,
});
现在我们可以输入示例文本并检查结果。
import { Document } from "@langchain/core/documents";
let text = `
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
`;
const result = await llmGraphTransformer.convertToGraphDocuments([
new Document({ pageContent: text }),
]);
console.log(`Nodes: ${result[0].nodes.length}`);
console.log(`Relationships:${result[0].relationships.length}`);
Nodes: 8
Relationships:7
请注意,由于我们使用了大语言模型(LLM),图构建过程具有非确定性。因此,每次执行可能会得到略微不同的结果。 请查看以下图片,以更好地理解生成的知识图谱结构。

此外,您还可以根据自己的需求灵活定义要提取的特定类型的节点和关系。
const llmGraphTransformerFiltered = new LLMGraphTransformer({
llm: model,
allowedNodes: ["PERSON", "COUNTRY", "ORGANIZATION"],
allowedRelationships: ["NATIONALITY", "LOCATED_IN", "WORKED_AT", "SPOUSE"],
strictMode: false,
});
const result_filtered =
await llmGraphTransformerFiltered.convertToGraphDocuments([
new Document({ pageContent: text }),
]);
console.log(`Nodes: ${result_filtered[0].nodes.length}`);
console.log(`Relationships:${result_filtered[0].relationships.length}`);
Nodes: 6
Relationships:4
为了更好地理解生成的图,我们可以再次将其可视化。

存储到图数据库
生成的图文档可以使用 addGraphDocuments 方法存储到图数据库中。
await graph.addGraphDocuments(result_filtered);