提问 HN:阻碍你的 AI 助手从概念验证走向实际应用的原因是什么?
1 分•作者: ns-148•10 个月前
我们一直在研发决策自动化技术,这项技术主要应用于企业,用于构建像领域专家一样的系统。 想象一下,基于结构化逻辑和知识的模型,可以被查询以提供可审计和可解释的决策。
最近,我们开始思考这是否能帮助解决一个不同的问题:将基于 LLM 的智能体投入生产。
根据我们的观察(以及我们自己的经验),使用 LangChain、AutoGen 或 CrewAI 等工具让智能体原型运行起来相对容易,但要将其转化为足够可靠和值得信赖、可用于实际使用的东西,则要困难得多。
我们遇到的一些问题包括:
* 智能体对相同的输入做出不同的决策
* 推理过程不透明,难以调试或信任
* 工具使用在演示中有效,但在边缘情况下失败
* 产生幻觉或不完整的决策,无法在生产环境中站稳脚跟
* 在采取行动之前,收集缺失信息的能力有限
这让我们思考:如果一个智能体能够整理数据,然后调用一个工具(我们的系统),该系统具有定制的符号模型(由您创建),该模型可以进行推理、提出后续问题(供 AI 智能体或人类回答),并提供确定性、可解释和可重复的结果,这是否有助于弥合通往生产的差距?这样会更值得信赖吗?
我们试图了解这种方法是否真的对实际的智能体实现有用,如果是,适用于哪种类型的决策或工作流程。
非常感谢任何从事基于智能体系统开发的人分享经验:
* 你们构建了什么?
* 你们有将任何东西投入生产吗?
* 在这个过程中,什么是最困难的?
* 您认为确定性、一致性或可解释性在哪里最重要?
我们不打算销售任何东西,因为我们还需要做很多工作才能使产品对开发人员更友好,只是想知道这个想法是否可行,并向构建智能体的人学习。
提前感谢任何愿意分享的人。
查看原文
We’ve been working on decision automation tech that’s mostly been used in enterprise for building systems that behave like domain experts. Think models based on structured logic and knowledge, which can be queried to provide decisions that are auditable and explainable.
Recently, we’ve started wondering whether this could help with a different kind of problem: getting LLM-based agents into production.<p>From what we’ve seen (and experienced ourselves), it’s relatively easy to get an agent prototype working with tools like LangChain, AutoGen, or CrewAI, but much harder to move that into something reliable and trustworthy enough for real use.<p>Some of the issues we’ve felt:<p>-Agents making different decisions from the same input<p>-Opaque reasoning that’s hard to debug or trust<p>-Tool use that works in demos but fails under edge cases<p>-Hallucinated or incomplete decisions that don’t stand up in production<p>-Limited ability to gather missing info before acting<p>It’s got us thinking: if an agent could collate data, then call a tool (our system) with a bespoke symbolic model (that you created) that could reason, ask follow-up questions (for an AI agent or human to answer) and provides results that are deterministic, explainable, and repeatable, would that help bridge the gap to production? Would this be more trustworthy?<p>We’re trying to understand whether this kind of approach would actually be useful in real-world agent implementations, and if so, for what kinds of decisions or workflows.<p>Would really appreciate hearing from anyone who’s been working on agent-based systems:<p>-What have you built?<p>-Have you shipped anything to production?<p>-What’s been hardest about that process?<p>-Where do you think determinism, consistency, or explainability would matter most?<p>Not selling anything, as we’d have lots of work to do to make the product more developer friendly anyway, just want to know whether the idea has legs and to learn from people building agents.<p>Thanks in advance to anyone willing to share.