为什么我们在向量搜索中接受静默数据损坏?(x86 vs. ARM)

1作者: varshith176 个月前
过去一周,我一直在追逐 RAG 管道中的一个“幽灵”,我认为我发现了一些业界集体忽视的问题。 我们假设,如果我们生成一个嵌入并存储它,那么“记忆”是稳定的。但我发现,f32 距离计算(FAISS、Chroma 等的核心)会充当一个“分叉路径”。 如果在 x86 服务器(AVX-512)和 ARM MacBook(NEON)上运行完全相同的插入序列,内存状态会在位级别上发生分歧。这不仅仅是“浮点噪声”,而是由 FMA(融合乘加)指令差异引起的确定性漂移。 我编写了一个脚本来检查我的 M3 Max 和 Xeon 实例上 sentence-transformers 向量的原始位。语义相似度为 0.9999,但原始存储是不同的。 对于受监管的 AI 代理(金融/医疗保健),这是一个噩梦。这意味着您的审计跟踪在技术上会产生幻觉,具体取决于处理查询的服务器。您无法实现“一次编写,随处运行”的索引可移植性。 解决方案(使用 no_std)我非常沮丧,以至于我绕过了标准库,并使用 Q16.16 定点算术在 Rust 中编写了一个自定义内核(Valori)。通过严格执行整数结合律,我在 x86、ARM 和 WASM 上获得了 100% 位相同的快照。 召回损失:可忽略不计(99.8% Recall@10 vs 标准 f32)。 性能:< 500µs 延迟(与未优化的 f32 相当)。 请求/论文 我写了一篇正式的预印本,分析了这种“分叉路径”问题和 Q16.16 证明。我目前正试图将其提交给 arXiv(分布式计算 / cs.DC),但我被困在推荐队列中。 如果您想拆解我的 Rust 代码:https://github.com/varshith-Git/Valori-Kernel 如果您是 cs.DC(或 cs.DB)的 arXiv 推荐人,并且想查看草稿,我很乐意将其发送给您。 难道只有我担心在如此不稳定的数值基础上构建“可靠”的代理吗?
查看原文
I spent the last week chasing a &quot;ghost&quot; in a RAG pipeline and I think I’ve found something that the industry is collectively ignoring.<p>We assume that if we generate an embedding and store it, the &quot;memory&quot; is stable. But I found that f32 distance calculations (the backbone of FAISS, Chroma, etc.) act as a &quot;Forking Path.&quot;<p>If you run the exact same insertion sequence on an x86 server (AVX-512) and an ARM MacBook (NEON), the memory states diverge at the bit level. It’s not just &quot;floating point noise&quot; it’s a deterministic drift caused by FMA (Fused Multiply-Add) instruction differences.<p>I wrote a script to inspect the raw bits of a sentence-transformers vector across my M3 Max and a Xeon instance. Semantic similarity was 0.9999, but the raw storage was different<p>For a regulated AI agent (Finance&#x2F;Healthcare), this is a nightmare. It means your audit trail is technically hallucinating depending on which server processed the query. You cannot have &quot;Write Once, Run Anywhere&quot; index portability.<p>The Fix (Going no_std) I got so frustrated that I bypassed the standard libraries and wrote a custom kernel (Valori) in Rust using Q16.16 Fixed-Point Arithmetic. By strictly enforcing integer associativity, I got 100% bit-identical snapshots across x86, ARM, and WASM.<p>Recall Loss: Negligible (99.8% Recall@10 vs standard f32).<p>Performance: &lt; 500µs latency (comparable to unoptimized f32).<p>The Ask &#x2F; Paper I’ve written a formal preprint analyzing this &quot;Forking Path&quot; problem and the Q16.16 proofs. I am currently trying to submit it to arXiv (Distributed Computing &#x2F; cs.DC) but I&#x27;m stuck in the endorsement queue.<p>If you want to tear apart my Rust code: https:&#x2F;&#x2F;github.com&#x2F;varshith-Git&#x2F;Valori-Kernel<p>If you are an arXiv endorser for cs.DC (or cs.DB) and want to see the draft, I’d love to send it to you.<p>Am I the only one worried about building &quot;reliable&quot; agents on such shaky numerical foundations?