为什么我们在向量搜索中接受静默数据损坏?(x86 vs. ARM)
1 分•作者: varshith17•6 个月前
过去一周,我一直在追逐 RAG 管道中的一个“幽灵”,我认为我发现了一些业界集体忽视的问题。
我们假设,如果我们生成一个嵌入并存储它,那么“记忆”是稳定的。但我发现,f32 距离计算(FAISS、Chroma 等的核心)会充当一个“分叉路径”。
如果在 x86 服务器(AVX-512)和 ARM MacBook(NEON)上运行完全相同的插入序列,内存状态会在位级别上发生分歧。这不仅仅是“浮点噪声”,而是由 FMA(融合乘加)指令差异引起的确定性漂移。
我编写了一个脚本来检查我的 M3 Max 和 Xeon 实例上 sentence-transformers 向量的原始位。语义相似度为 0.9999,但原始存储是不同的。
对于受监管的 AI 代理(金融/医疗保健),这是一个噩梦。这意味着您的审计跟踪在技术上会产生幻觉,具体取决于处理查询的服务器。您无法实现“一次编写,随处运行”的索引可移植性。
解决方案(使用 no_std)我非常沮丧,以至于我绕过了标准库,并使用 Q16.16 定点算术在 Rust 中编写了一个自定义内核(Valori)。通过严格执行整数结合律,我在 x86、ARM 和 WASM 上获得了 100% 位相同的快照。
召回损失:可忽略不计(99.8% Recall@10 vs 标准 f32)。
性能:< 500µs 延迟(与未优化的 f32 相当)。
请求/论文 我写了一篇正式的预印本,分析了这种“分叉路径”问题和 Q16.16 证明。我目前正试图将其提交给 arXiv(分布式计算 / cs.DC),但我被困在推荐队列中。
如果您想拆解我的 Rust 代码:https://github.com/varshith-Git/Valori-Kernel
如果您是 cs.DC(或 cs.DB)的 arXiv 推荐人,并且想查看草稿,我很乐意将其发送给您。
难道只有我担心在如此不稳定的数值基础上构建“可靠”的代理吗?
查看原文
I spent the last week chasing a "ghost" in a RAG pipeline and I think I’ve found something that the industry is collectively ignoring.<p>We assume that if we generate an embedding and store it, the "memory" is stable. But I found that f32 distance calculations (the backbone of FAISS, Chroma, etc.) act as a "Forking Path."<p>If you run the exact same insertion sequence on an x86 server (AVX-512) and an ARM MacBook (NEON), the memory states diverge at the bit level. It’s not just "floating point noise" it’s a deterministic drift caused by FMA (Fused Multiply-Add) instruction differences.<p>I wrote a script to inspect the raw bits of a sentence-transformers vector across my M3 Max and a Xeon instance. Semantic similarity was 0.9999, but the raw storage was different<p>For a regulated AI agent (Finance/Healthcare), this is a nightmare. It means your audit trail is technically hallucinating depending on which server processed the query. You cannot have "Write Once, Run Anywhere" index portability.<p>The Fix (Going no_std) I got so frustrated that I bypassed the standard libraries and wrote a custom kernel (Valori) in Rust using Q16.16 Fixed-Point Arithmetic. By strictly enforcing integer associativity, I got 100% bit-identical snapshots across x86, ARM, and WASM.<p>Recall Loss: Negligible (99.8% Recall@10 vs standard f32).<p>Performance: < 500µs latency (comparable to unoptimized f32).<p>The Ask / Paper I’ve written a formal preprint analyzing this "Forking Path" problem and the Q16.16 proofs. I am currently trying to submit it to arXiv (Distributed Computing / cs.DC) but I'm stuck in the endorsement queue.<p>If you want to tear apart my Rust code: https://github.com/varshith-Git/Valori-Kernel<p>If you are an arXiv endorser for cs.DC (or cs.DB) and want to see the draft, I’d love to send it to you.<p>Am I the only one worried about building "reliable" agents on such shaky numerical foundations?