展示 HN:大规模运行 Claude 代码群的经验教训
6 分•作者: sermakarevich•27 天前
不久前,我构建了一个简单的应用程序,用于运行代码代理集群,我称之为 fleet(<a href="https://news.ycombinator.com/item?id=48256389">https://news.ycombinator.com/item?id=48256389</a>)。它基于中心化的“beads”,并使用 Python 协调器,可以运行任何代码代理(Claude、agy、Codex)。最近,我为其添加了一个用户界面,用于管理整个代理生命周期:添加新任务、监控正在运行的任务,以及一个基于 MCP 和中心化 SQLite 数据库构建的聊天界面。通过该 UI,我可以在任何目录中启动代理运行,定义与其他任务的依赖关系,并指定应执行任务的代码代理/模型。目前,我可以同时运行 10-15 个代理。在这个规模下,token 消耗非常快,所以我花了一些时间研究这些限制的去向以及如何最大化效率。以下是我运行 fleet 几周后学到的经验:
* CLAUDE.md 是一个糟糕的抽象。这些文件会无条件加载,通常包含与当前任务无关的描述,并且会从工作目录向上堆叠。结果是 token 的浪费,以及将不相关的指令注入会话造成的混乱。
* Skills(技能)不好,但不如 CLAUDE.md 糟糕。它们采用渐进式披露方法:只有技能描述会进入会话,当 Claude 需要时,它会通过工具加载完整的技能文本。这好了一个层级,但仍然无法扩展——你无法创建 10K 个技能,因为这会耗尽你所有的可用上下文。Claude 最近引入了一个技能预算,它会默默地将使用频率较低的技能完全从会话中删除。你仍然可以在交互式会话中调用它们,但模型无法在后台会话中调用它们。
* 某些插件可能会被安装多次。在清理过程中,我发现我的几个插件安装在多个位置,导致重复指令消耗了双倍的 token。
* 将插件附加到每个会话在规模化时是个坏主意。你应该精确地指定哪些插件真正有用,并为每个任务单独附加它们。
* 使用分层知识库,而不是 CLAUDE.md / skills / plugins。它允许你受益于真正的渐进式披露:将指令和工具描述保存在其中,让 Claude 能够快速且经济地浏览。
* 系统工具消耗约 15K token(占会话的 7%)。你无法管理这一点——它们只是被附加,禁用工具并不会将它们从上下文中移除。
* AskUserQuestion 在后台会话中不可用。你需要实现自己的工具——基于 MCP 或 CLI——来让 `claude -p` 能够与你对话。
* 你会变得有选择性,决定哪个模型处理哪个任务。将工作分解成更难和更简单的子任务,这样你就可以将更简单的任务路由给更弱、更便宜的模型,从而节省 token。
* 你的上下文切换技能会随着时间的推移而提高。
Fleet 代码库:<a href="https://github.com/sermakarevich/fleet" rel="nofollow">https://github.com/sermakarevich/fleet</a>
查看原文
Some time ago I built a simple app to run swarms of coding agents — I call it fleet (<a href="https://news.ycombinator.com/item?id=48256389">https://news.ycombinator.com/item?id=48256389</a>). It's based on centralized beads with a Python orchestrator and can run any coder (Claude, agy, Codex). Recently I added a UI to manage the whole agent lifecycle: adding new tasks, monitoring running ones, and a chat interface built on MCP with a centralized SQLite DB. From the UI I can spawn agents to run in any directory, define dependencies on other tasks, and specify which coder/model should do the job. Today I can run 10–15 agents concurrently. At that scale you burn through limits very fast, so I spent some time investigating where those limits go and how to maximize efficiency. Here are the lessons learned after a few weeks of running the fleet:<p>- CLAUDE.md is a terrible abstraction. These files load unconditionally, they often contain descriptions irrelevant to the task at hand, and they stack from your working directory upward. The result is wasted tokens and confusion from injecting irrelevant instructions into the session.<p>- Skills are bad, but not as bad as CLAUDE.md. They use a progressive disclosure approach: only the skill description goes into the session, and Claude loads the full skill text with a tool when it's needed. That's one level better, but it still doesn't let you scale — you can't create 10K skills, as that would eat your entire usable context. Claude recently introduced a skills budget that silently drops less frequently used skills from the session entirely. You can still invoke them in an interactive session, but the model can't invoke them in a background session.<p>- Some plugins may be installed more than once. During cleanup I found that a few of mine were installed in multiple locations, consuming double the tokens on duplicated instructions.<p>- Attaching plugins to every session is a bad idea at scale. You want to be precise about which plugins are actually useful and attach them per task.<p>- Use a hierarchical knowledge base instead of CLAUDE.md / skills / plugins. It lets you benefit from real progressive disclosure: keep your instructions and tool descriptions in it and let Claude navigate through it quickly and cheaply.<p>- System tools consume ~15K tokens (7% of the session). You can't manage this — they're just attached, and disabling tools doesn't remove them from the context.<p>- AskUserQuestion isn't available in background sessions. You need to implement your own tool — MCP- or CLI-based — to give `claude -p` the ability to talk to you.<p>- You become selective about which model handles each task. Decompose work into harder and simpler subtasks so you can route the simpler ones to weaker, cheaper models and save tokens.<p>- Your context-switching skill improves over time.<p>Fleet repo: <a href="https://github.com/sermakarevich/fleet" rel="nofollow">https://github.com/sermakarevich/fleet</a>