HackerNews中文版

大家好，我是 Jerry 和 Wyatt，来自 Halluminate (<a href="https://halluminate.ai/">https://halluminate.ai/</a>)。我们帮助 AI 实验室利用高质量数据和强化学习 (RL) 环境来训练计算机使用代理。训练 AI 代理使用计算机、浏览器和软件是 AI 领域最具潜力的机会之一。然而，到目前为止，这项能力仍然不稳定。目前，改进这种能力的新兴方法被称为“可验证奖励的强化学习”（RLVR）。但研究人员目前面临的瓶颈是缺乏高质量的模拟器以及任务和验证器。为了解决这个问题，我们正在构建 Westworld，一个完全模拟的互联网，由最常见的消费者和企业应用程序的合成版本组成。代理使用 Westworld 来学习如何执行具有经济价值的任务。例如，AI 代理可以在模拟的航班预订网站 (<a href="https://flights.halluminate.ai/">https://flights.halluminate.ai/</a>) 上练习规划度假，或者学习如何重新组织销售平台中的过时信息，或者训练直接在电子表格中进行财务建模。这里有一个演示，展示了我们的航班预订模拟：<a href="https://www.loom.com/share/74a3b28067e24c1b886054ba90a90aa5" rel="nofollow">https://www.loom.com/share/74a3b28067e24c1b886054ba90a90aa5</a>。工作原理：AI 代理访问我们的环境，并被赋予一个任务和验证器。任务基本上是代理要实现的目标，例如“帮我预订从旧金山到纽约的航班，日期是某年某月某日，并使用 x、y、z 筛选条件。” 验证器是一种程序化的方式，用于确定任务是否成功完成。例如，在这种情况下，它可能是一个 json 文件，用于检查最终的航班数据是否符合预期。然后，这些信号可以用于计算 RL 中的奖励。我们构建的模拟器越多，AI 实验室就能越好地改进计算机使用代理目前表现不佳的能力。我们的一个客户在使用我们的航班预订模拟器进行训练时，日期选择性能提高了约 20%。到目前为止，有两个因素使得这项工作变得困难：(1) 模拟必须是真实的。你不能满足于“80% 的解决方案”，因为即使是很小的差异也会影响性能。生成模拟数据甚至更难。例如，处理航班数据使其看起来真实需要大量的试验和实验。(2) 你训练代理的任务必须是精心挑选的。只有当它们反映了人们真正希望解决的工作时，它们才是有价值的。我们需要大量来自领域专家的反馈才能做到这一点。也就是说，我们发现这项工作非常有趣，并很高兴解决这些问题。我们近期内计划发布的一些令人兴奋的内容：- 通过将多个模拟器串联起来以实现扩展工作流程，从而能够训练长期任务；- 程序化数据生成。我们如何建模数据生成，以便在代理探索时程序化地填充我们的模拟器（类似于 Minecraft），而不是预先合成生成所有数据；- 开源！我们计划向公众发布我们的环境，以便开发人员/研究人员可以对其进行修改，用于自己的实验。RL 模拟器只是我们业务的一部分。另一部分是关于人类数据创建（类似于 Scale AI，但用于计算机使用）。我们为客户提供现成的预训练/微调数据集、专家级的人工评估/错误分析，或任何其他数据需求。这两者之间也有很多令人兴奋的重叠——例如，使用人类专家来帮助创建我们的模拟器/任务。很乐意详细介绍，但我们认为模拟器会成为更令人感兴趣的 HackerNews 帖子 :)最后，关于我们：Wyatt 和我是在康奈尔大学学习计算机科学时认识的，并且已经一起生活和工作了 7 年多。我之前在 Capital One Labs 领导产品/研究，在那里我推出了银行业务中最早的 AI 代理之一。Wyatt 之前是康奈尔 Milstein 学者，并为纽约市的两家早期创业公司做了大规模的数据工程。我们去年辞去了工作，并在为我们的客户（他们是浏览器/计算机使用代理公司）构建评估时亲身体验了这些问题。如果有人有任何问题、反馈或想法，请告诉我们！期待您的评论。

查看原文

Hi everyone, Jerry and Wyatt here from Halluminate (<a href="https://halluminate.ai/">https://halluminate.ai/</a>). We help AI labs train computer use agents with high quality data and RL environments.Training AI agents to use computers, browsers, and software is one of the highest-potential opportunities for AI. To date, however, this capability is still unreliable. The emerging method to improve this is called Reinforcement Learning with Verifiable Rewards (RLVR). However, researchers are currently bottlenecked by a lack of high-quality simulators and task + verifiers.To solve this problem, we’re building Westworld, a fully-simulated internet made up of synthetic versions of the most common consumer and enterprise apps. Agents use Westworld to learn how to do economically valuable tasks.For example, AI agents can practice planning vacations on a simulated flight booking site (<a href="https://flights.halluminate.ai/">https://flights.halluminate.ai/</a>), or learn how to reorganize outdated information in your sales platform, or train to do financial modeling directly in a spreadsheet.Here’s a demo showing our flight booking simulation: <a href="https://www.loom.com/share/74a3b28067e24c1b886054ba90a90aa5" rel="nofollow">https://www.loom.com/share/74a3b28067e24c1b886054ba90a90aa5</a>.How it works: AI agents access our environment and are given a task + verifier. A task is basically an objective for the agent to achieve, for example "Book me a flight from SF to NYC on this date with x, y, z filters.” A verifier is a programmatic way to determine if the task was successfully completed. For example, in this case it might be a json that checks if the final flight data matches expectations. These signals can then be used to calculate a reward in RL.The more simulators we build, the more AI labs can improve on capabilities that computer use agents are currently weak at. One of our customers saw a ~20% improvement in date-picking performance when training on our flight booking simulator.Two things make this hard so far:(1) The simulations have to be realistic. You can’t get away with a vibe-coded “80% solution” because even small divergences impact performance. Generating simulated data is even harder. For example, massaging flight data to look realistic took a lot of trial and experimentation.(2) The tasks you train agents on have to be well-chosen. They are only valuable if they reflect work that people actually want solved. We need a lot of feedback from domain experts to get this right.That said, we find this work incredibly interesting and are excited to tackle these issues. A few things we are pumped to ship in the near term: - Ability to train on long-horizon tasks by stringing multiple simulators together for extended workflows; - Procedural data generation. Instead of synthetically generating all the data upfront, how can we model data generation so that our simulators are populated procedurally as agents explore (think Minecraft); - Open source! We plan to release our environments to the public so developers/researchers can hack them for their own experimentation.RL simulators are just one part of our business. The other part is around human data creation (think Scale AI but for computer use). We produce off-the-shelf pre-training/fine-tuning datasets, expert human evaluation/error analysis, or any other data needs for our customers. There are also a lot of exciting overlaps between the two - for example, using human experts to help create our simulators/tasks. Happy to go in more detail, but we thought that simulators would make for the more interesting HackerNews post :)Finally, about us: Wyatt and I met while studying CS at Cornell and have been living and working together for over 7 years. I previously led product/research at Capital One Labs, where I launched one of the first AI agents in banking. Wyatt previously was a Cornell Milstein scholar and did large-scale data engineering for 2 early-stage startups in NYC. We left our jobs last year, and faced these problems first-hand while building evals for our customers who were browser/computer use agent companies.If anyone has any questions, feedback, or thoughts please let us know! Looking forward to your comments.

Launch HN: Halluminate (YC S25) – 推出 Halluminate (YC 夏季班 25) – 模拟互联网以训练计算机使用