Ask HN: 我们是不是在假装 RAG 已经准备好了,但它其实才刚出演示阶段?

3作者: TXTOS10 个月前
我观察到 RAG(检索增强生成)浪潮冲击生产环境已经一年多了。<p>但总有些事情让我困扰: 大多数设置仍然感觉像是用希望和向量搜索拼凑起来的、被美化的笔记本。<p>是的,它“能用”——直到你真正需要它的时候。 突然间:不相关的片段、幻觉、浅显的查询改写、没有记忆循环,以及一碰就坏的检索堆栈。<p>我们面临的问题有: • 管道与用户实际想问的问题不符, • 检索更像搜索引擎,而不是推理辅助, • 评估脆弱(因为“正确的上下文”≠“正确的答案”), • 并且没有人确定哪里是事实依据的终点,哪里是幻觉的起点。<p>当然,你“可以”让它工作——如果你愿意用胶带粘合每个组件,并 24/7 全天候地照看系统。<p>所以我不得不问: RAG 仅仅是停留在假装进入生产环境的原型阶段吗? 还是有人真的构建了一个能够经受住用户混乱和边缘情况考验的设置?<p>很想听听哪些有效,哪些无效,以及你不得不放弃了什么。<p>我没有推销任何东西,只是深陷其中,想和真正交付过产品的人一起进行理智的核查。
查看原文
Been watching the RAG (Retrieval-Augmented Generation) wave crash into production for over a year now.<p>But something keeps bugging me: Most setups still feel like glorified notebooks stitched together with hope and vector search.<p>Yeah, it &quot;works&quot; — until you actually need it to. Suddenly: irrelevant chunks, hallucinations, shallow query rewriting, no memory loop, and a retrieval stack that breaks if you breathe on it wrong.<p>We’ve got: • pipelines that don’t align with what users <i>actually</i> want to ask, • retrieval that acts more like a search engine than a reasoning aid, • brittle evals (because &quot;correct context&quot; ≠ &quot;correct answer&quot;), • and no one’s sure where grounding ends and illusion begins.<p>Sure, you <i>can</i> make it work — if you’re okay duct-taping every component and babysitting the system 24&#x2F;7.<p>So I gotta ask: Is RAG just stuck in prototype land pretending to be production? Or has someone here actually built a setup that survives user chaos and edge cases?<p>Would love to hear what’s worked, what hasn&#x27;t, and what you had to throw away.<p>Not pushing anything, just been knee-deep in this and looking to sanity check with folks who’ve actually shipped stuff.