展示 HN:过去三天我一直在不间断地运行 3 个编码代理。方法如下

2作者: sermakarevich11 天前
1. 无头模式 无头模式允许您将 AI 用作命令行实用程序,用于自动化和脚本编写。在 Claude Code 中,您可以使用 `-p` 标志运行它:`claude -p`,在 codex 中是 `- exec`,在 opencode 中是 `- run`。 2. 询问人类 在无头模式下,与操作员的传统通信渠道将无法正常工作——我们需要实现一个专用工具。以下是如何实现这一目标的示例:[https://github.com/sermakarevich/claude/tree/main/mcp/ask_human](https://github.com/sermakarevich/claude/tree/main/mcp/ask_human) 3. 任务队列 Beads 是一个轻量级的分布式图问题跟踪器,专为 AI 代理设计,由 Dolt 提供支持。您可以创建任务,定义任务之间的依赖关系,并设置状态、优先级和层级。Beads 有助于防止多个任务被一个以上的 worker 认领。 4. Worker 工件 我们希望能够监控 worker 的运行情况、所处阶段,并在重启后恢复它。对于每个任务,我们可以使用 beads 任务 ID 创建一个专用文件夹,并将所需内容放入其中。我放入了: - 计划和状态 md - 知识 md - events.jsonl - stderr worker 在其提示中被指示检查工件是否存在,这允许它从作业中断处继续。 5. Worker 隔离 为了准备运行多个 worker,我们需要隔离它们。可以使用 Git worktree。我正在测试这种方法: - worker 获取任务并执行它 - 下一个自动生成的 worker 验证任务是否完成,对其进行测试,合并 worktree,关闭 ticket,并在需要时创建另一个 ticket 进行修复 6. 多个 Worker 为了能够运行多个 worker,我们需要一个简单的协调器。一个无限循环不断检查 beads/config,并在需要时触发新的 worker。 7. Coder 不可知 一个 worker 基本上可以是任何 coder。我从 Claude 开始,然后添加了 Codex 和 Agy。最后添加了 Opencode。 8. 订阅限制。 即使切换到 Sonnet 4.6,3 个编码代理也可能在 30 分钟内耗尽 Claude 200 美元的订阅额度。API 令牌的价格是订阅内令牌的 40 倍——这太贵了。我正在测试的想法是: - 使用最强大的模型进行分析/设计并添加任务 - 使用本地模型作为 worker - 使用更强大的模型来验证 worker 并添加新任务以修复潜在的错误实现 我正在使用 Ollama 部署的 qwen3.6:36B 本地模型,该模型部署在 2 个 GPU 卡上,总共 36GB,具有 256K 上下文窗口。这速度较慢,但免费。令人惊讶的是,它的效果比我预期的要好得多。Fable 5 在创建清晰简单的 ticket 方面表现出色,直到它不再是这样。 我曾考虑过的另一种方法是 Bedrock qwen,按 token 付费,或者每月租用一个 96GB 的 GPU,费用为 1400 美元。 我发现同时运行 3 个 worker 是最优的,尽管 Ollama 一次只处理 1 个请求。原因是 ask_human 工具。如果一个 worker 在晚上向我提问——它必须等到早上才能做任何事情。运行三个 worker 大致上可以保证 GPU 负载达到 100%。 9. 良好的集成 UI - 用于观察任务/beads/config/chat/analytics 模型提出问题时很容易错过。它在 UI 中可见——聊天旁边有一个绿色的圆圈,但仅此而已。所以我添加了 Telegram 集成——现在我可以在 Telegram 上接收来自 worker 的问题并回复,获取任务状态,创建新任务等。 我当然是在为我的 PoC 项目做这件事: - 改进 fleet - 构建一个数据收集和分析相关的应用程序 我看到的是,24x7 的 coder 比我想象的要近。即使是较弱的模型,当任务简单且定义明确时,也能取得良好的成果。构建这些系统的所有组件都已具备。 仓库:[https://github.com/sermakarevich/fleet](https://github.com/sermakarevich/fleet)
查看原文
1. Headless mode<p>Headless mode allows you to use the AI as a command-line utility for automation and scripting. In Claude Code you run it with the -p flag: claude -p, in codex - exec, opencode - run.<p>2. Ask human<p>The traditional communication channel with the operator won&#x27;t work in headless mode - we need to implement a dedicated tool. Here is an example of how this can be done <a href="https:&#x2F;&#x2F;github.com&#x2F;sermakarevich&#x2F;claude&#x2F;tree&#x2F;main&#x2F;mcp&#x2F;ask_human" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sermakarevich&#x2F;claude&#x2F;tree&#x2F;main&#x2F;mcp&#x2F;ask_hu...</a><p>3. Tasks queue<p>Beads is a lightweight distributed graph issue tracker for AI agents, powered by Dolt. You can create tasks, define dependencies between tasks, and have status, priorities, hierarchy. Beads helps prevent multiple tasks from being claimed by &gt; 1 worker.<p>4. Worker artifacts<p>We want to be able to monitor how a worker is doing, at what stage it is, and resume it after a restart. For every task we can create a dedicated folder using the beads task id and put into it what we need. I put there: - plan and status md - knowledge md - events.jsonl - stderr<p>The worker is instructed in its prompt to check if artifacts exist, which allows it to proceed from where the job was left.<p>5. Worker isolation<p>To prepare to run multiple workers we need to isolate them. Git worktree can be used here. I am testing this approach: - worker gets the task and implements it - the next worker, spawned automatically, validates the task is done, tests it, merges the worktree, closes the ticket and creates another one for a fix if required<p>6. Multiple workers<p>To be able to run multiple workers we need a simple orchestrator. An infinite loop constantly checking beads &#x2F; config and triggering new workers when required.<p>7. Coder agnostic<p>A worker can be basically any coder. I started with Claude, added Codex and Agy. And last added Opencode.<p>8. Subscription limits.<p>3 coding agents can burn the Claude $200 subscription limit in 30 minutes even if you switch to Sonnet 4.6. API tokens cost x40 compared to tokens in the subscription - this is too expensive. The idea I am testing is: - use the strongest model possible to analyse&#x2F;design and add tasks - use a local model as a worker - use a stronger model to validate workers and add new tasks to fix potential misimplementations<p>I am using the qwen3.6:36B local model with Ollama, deployed on 2 GPU cards, 36GB in total, with a 256K context window. This is slower, but it is free of charge. And surprisingly it worked, and worked way better than I would expect it to. Fable 5 was extremely great at creating clear and simple tickets until it was.<p>Another approach I was considering is Bedrock qwen, paying per token, or renting a 96GB GPU for $1400 per month.<p>I found that it&#x27;s optimal to run 3 workers concurrently even though Ollama processes 1 request at a time. The reason is the ask_human tool. If a worker asks me something at night - it has to wait until morning doing nothing. Running three +&#x2F;- guarantees GPU load at 100%.<p>9. Nice integrations<p>UI - to observe tasks &#x2F; beads &#x2F; config &#x2F; chat &#x2F; analytics<p>It&#x27;s easy to miss when a model asks a question. It&#x27;s visible in the UI - a green circle near chat, but that&#x27;s it. So I added a Telegram integration - now I receive questions from workers on Telegram and can reply there, get the status of tasks, create new tasks etc.<p>I am doing this for my PoC projects ofc: - improving fleet - building a data collection and analysis related app<p>What I am seeing is that 24x7 coders are closer than I thought they are. Even weaker models can deliver good results when the task is simple and well defined. All components for building these systems are there.<p>Repo: <a href="https:&#x2F;&#x2F;github.com&#x2F;sermakarevich&#x2F;fleet" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sermakarevich&#x2F;fleet</a>