我用图数据库替换了我AI代理的扁平化事实存储
3 分•作者: grawl_dorgiers•30 天前
我用图数据库替换了我AI代理的扁平事实存储,它仅占用 85MB 内存
我一直在构建 LocalClaw,一个本地模型优先的 AI 代理框架,通过 Ollama 在个人硬件上运行。没有云,没有 API 成本。几周前我发布了关于路由器/专家架构的内容。很多人询问了内存系统,所以今天就来介绍一下。
## 问题
最初使用的是 JSONL 事实存储和嵌入相似性检索。这很简单,直到它变得不简单。经过几周的实际使用,我发现关于同一主题的近乎重复的事实有 14 条,来自不同的会话。即使层层去重,结果仍然不干净。
更大的问题在于关系。“Peter 在 DevMesh 工作”和“DevMesh 正在构建一个外展平台”是两个独立的嵌入。你可以检索到其中任何一个,但无法从一个跳转到另一个。没有多跳查询,没有事实演变。旧事实和新事实并存,但没有信号表明哪个是当前的。
在扁平存储上进行了四次迭代后,我意识到我是在修补错误的东西。
## 为什么选择 FalkorDB
我考察了 Neo4j(社区版功能受限)、Memgraph(没有原生向量搜索)和 FalkorDB。
FalkorDB 在 Docker 中运行,使用 Redis 线协议,具有原生 HNSW 向量搜索,并且在我目前的规模下,整个系统仅占用 85MB 内存。在一个容器中实现了图遍历、向量相似性和混合关键词搜索。无需单独的 Qdrant,也无需在两个存储之间进行同步。
## 图数据库带来的可能性
每个事实通过 ABOUT 边连接到它引用的实体。多跳遍历变得自然——查找与项目相关的所有内容,查找与某个技术一起提及的所有实体。
当事实发生变化时,新事实会获得一个 SUPERSEDES 边指向旧事实。两者都带有时间戳。现在可以进行时间查询了。“系统上个月对这件事了解多少?”现在是一个可以实现的查询。
向量索引在 FalkorDB 内部运行,使用 qwen3-embedding:8b 生成的 4096 维嵌入。采用 O(log n) 的 HNSW 搜索。无需外部数据库。
## 让我惊讶的部分
由小型本地模型进行的实体提取是不可靠的。phi4-mini 将 DGX Spark 分类为软件,并为同一实体的单数和复数形式创建了单独的节点。
解决方法:在提取新事实的实体之前,从图中查询现有的已分类实体,并将它们作为参考上下文注入 NER 提示。现在,phi4-mini 在分类任何新内容之前会看到“DGX Spark → 硬件,FalkorDB → 软件”。每个正确分类的实体都能使未来的提取更加一致。图数据库可以在不进行任何额外训练的情况下,随着时间的推移教会模型。
## 评分
纯粹的向量相似性会显示语义上最接近的内容,而不管它是否重要。评分公式如下:
```
score = similarity × 0.5 + recency × 0.2 + importance × 0.3
```
重要性使用 1-5 的等级(关键健康/家庭 = 5,工作/身份 = 4,偏好 = 3,上下文 = 2,临时 = 1)。一个中等相关但关键的事实得分高于一个高度相关但临时的事实。你妻子的健康状况会比昨天的天气更优先显示。
## 我学到的东西
模型本身不进行计算。代码负责处理哪些事实已更改,哪些是重复的,以及分数是多少。模型负责理解这些信息的含义。一旦你让模型进行算术或基于哈希的去重,就会出现无法解释的故障。
重要性等级需要具体的示例来指导提取提示。phi4:14b 默认将所有内容都归为 2 级,直到我添加了带有情感权重的少样本示例。抽象的指令无法校准模型。
一旦你需要关系推理,图数据库就优于扁平存储。仅 SUPERSEDES 链就足以证明迁移的合理性。
完全在 Mac Mini 上运行。图数据库占用 85MB。一切都在本地。
GitHub:https://github.com/PeterGreenAppliedAI/LocalClaw
查看原文
# I Replaced My AI Agent's Flat Fact Store with a Graph Database and It Runs in 85MB<p>I've been building LocalClaw, a local-model-first AI agent framework running on personal hardware through Ollama. No cloud, no API costs. A few weeks ago I posted about the router/specialist architecture. A lot of people asked about the memory system so here's that.<p>## The Problem<p>Started with a JSONL fact store and embedding similarity retrieval. Simple enough until it wasn't. After a few weeks of real use I had 14 near-duplicate facts about the same topics from different sessions. Layered dedup on top of dedup and it still wasn't clean.<p>The bigger problem was relationships. "Peter works at DevMesh" and "DevMesh is building an outreach platform" were two separate embeddings. You could retrieve each one but you couldn't traverse from one to the other. No multi-hop. No fact evolution. Old facts and new facts coexisted with no signal about which was current.<p>Four iterations on the flat store later I accepted I was patching the wrong thing.<p>## Why FalkorDB<p>Looked at Neo4j (Community Edition is intentionally crippled), Memgraph (no native vector search), and FalkorDB.<p>FalkorDB runs in Docker, uses the Redis wire protocol, has native HNSW vector search, and the entire thing sits at 85MB at my current scale. Graph traversal, vector similarity, and hybrid keyword search in one container. No separate Qdrant, no sync issues between two stores.<p>## What the Graph Enables<p>Every fact connects to the entities it references via ABOUT edges. Multi-hop traversal becomes natural - find everything connected to a project, find all entities mentioned alongside a technology.<p>When a fact changes, the new fact gets a SUPERSEDES edge to the old one. Both persist with timestamps. Temporal queries now work. "What did the system know about this last month?" is a real query.<p>The vector index runs inside FalkorDB on 4096-dimensional embeddings from qwen3-embedding:8b. O(log n) HNSW search. No external database.<p>## The Part That Surprised Me<p>Entity extraction by a small local model is unreliable blind. phi4-mini classified DGX Spark as software and created separate nodes for singular and plural forms of the same entity.<p>Fix: before extracting entities from a new fact, query existing typed entities from the graph and inject them into the NER prompt as reference context. Now phi4-mini sees "DGX Spark → hardware, FalkorDB → software" before it classifies anything new. Each correctly typed entity makes future extractions more consistent. The graph teaches the model over time without any additional training.<p>## Scoring<p>Pure vector similarity surfaces whatever is semantically closest regardless of whether it matters. The scoring formula:<p>```
score = similarity × 0.5 + recency × 0.2 + importance × 0.3
```<p>Importance uses a 1-5 tier (critical health/family = 5, job/identity = 4, preference = 3, context = 2, ephemeral = 1). A moderately relevant but critical fact scores higher than a highly relevant but ephemeral one. Your wife's health condition surfaces above yesterday's weather.<p>## What I Learned<p>The model computes nothing. Code handles which facts changed, which are duplicates, what the scores are. The model handles what it means. The moment you let a model do arithmetic or hash-based dedup you get failures you can't explain.<p>Importance tiers need concrete examples in the extraction prompt. phi4:14b defaulted everything to tier 2 until I added few-shot examples with emotional weight. Abstract instructions don't calibrate a model.<p>The graph beats flat storage the moment you need relationship reasoning. SUPERSEDES chain alone justified the migration.<p>Runs entirely on a Mac Mini. 85MB for the graph. Everything local.<p>GitHub: https://github.com/PeterGreenAppliedAI/LocalClaw