Ask HN:将智能体应用投入生产环境时,你遇到过哪些最糟糕的经历?

2作者: yaoke259大约 1 个月前
简单介绍一下背景,我目前正在工作中组建一个 AI 智能体团队,通过分发到大量子智能体来处理大量转录数据,从而生成报告。当分析在中途失败时,例如某个 API 调用返回错误或机器内存不足,就会引发级联错误,导致整个生成过程崩溃,而且几乎没有任何可见性。过去一个月我一直在将各个任务重写为 DBOS 上的持久化执行任务,但想知道是否有更好的解决方案,以及其他人是否遇到过类似的问题?还有一个问题是如何向用户反馈进度,老实说,我一直在临时编写代码… 当一个智能体在 12 个步骤中的第 9 步失败时,您是如何处理的? 您在智能体基础设施(持久性、监控、人工介入、实时 UI)上投入了大约多少工程师周,与实际的智能体逻辑相比?很好奇我的比例是否正常。 对于那些自己构建了这类东西的人来说:是否曾有过构建与购买的讨论?一个工具需要具备什么功能才能让您选择购买而不是构建? 您目前是否为您的智能体堆栈中的任何东西付费(LangSmith、Temporal、Braintrust 等)?是什么让它值得成为一个单独的条目,而其他的不值得,我也应该考虑一下吗?
查看原文
For a bit of context, I’m currently creating a team of AI agents at work to generate reports by fanning out into a large amount of subagents to process a large amount of transcript data. When the analysis fails mid-way because of some individual step like an API call returns an error or the machine is out of memory, it would create cascading errors that break the entire generation with almost no visibility. I’ve just spent the past month rewriting the individual jobs as durable execution jobs on DBOS but just wondering if there are better solutions out there and if others encountered similar issues? And then there is the issue to reflect back the progress to the users which I’ve just been coding ad-hoc honestly…<p>When an agent fails at step 9 of 12, how do you handle that?<p>Roughly how many engineer-weeks have you sunk into agent infrastructure (durability, monitoring, human-in-the-loop, live UI) vs. the actual agent logic? Curious if my ratio is normal.<p>For those who built this stuff in-house: was it ever a build-vs-buy conversation? What would a tool have had to do for you to buy instead of build?<p>Do you currently pay for anything in your agent stack (LangSmith, Temporal, Braintrust, etc.)? What made that one worth a line item when others weren&#x27;t and should I look into it too?