Launch HN: Cyberdesk (YC S25) – 推出 Cyberdesk (YC S25) – 自动化 Windows 传统桌面应用

11作者: mahmoud-almadi9 个月前
大家好,我们是Mahmoud和Alan,正在构建Cyberdesk (<a href="https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;</a>),这是一个确定性的计算机使用代理,用于自动化Windows桌面应用程序。开发者使用我们来自动化医疗保健、会计、建筑等行业的遗留软件中的重复性任务,通过直接在桌面上执行点击和按键操作来实现。<p>这里有几个Cyberdesk计算机使用代理的演示:<p>快速完成文件导入自动化到遗留桌面应用程序:<a href="https:&#x2F;&#x2F;youtu.be&#x2F;H_lRzrCCN0E" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;H_lRzrCCN0E</a><p>处理一个名为OpenDental的庞大Windows巨型程序(展示了代理的学习过程):<a href="https:&#x2F;&#x2F;youtu.be&#x2F;nXiJDebOJD0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;nXiJDebOJD0</a>。<p>填写W-2税表:<a href="https:&#x2F;&#x2F;youtu.be&#x2F;6VNEzHdc8mc" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;6VNEzHdc8mc</a><p>许多行业仍然在使用遗留的Windows桌面应用程序,员工们饱受重复性任务的困扰,这些任务非常耗时。为这些应用程序提供自动化方案的供应商最终编写了脆弱的机器人流程自动化(RPA)脚本,或者聘请离岸团队进行手动任务执行。由于不可避免的UI更改或意外的弹出窗口(如Windows更新或应用程序内随机通知),RPA经常中断。离岸团队通常不可靠,而且比软件更昂贵,而且对于受监管的行业来说,它们并不总是一个选择。<p>我之前在一家财富100强公司构建了RPA脚本,影响了2万多名员工,在那里我亲身体验了RPA的脆弱性和不灵活性。对我来说,这显然是解决未解决问题的权宜之计。Alan在他的上一家创业公司中构建了一个计算机使用代理,并意识到它在自动化许多行业的大量手动计算机任务方面的巨大潜力,所以我们开始着手开发Cyberdesk。<p>计算机使用模型可能难以处理抽象的、长期的任务,但它们擅长于在屏幕上做出上下文感知的决策,因此它们非常适合自动化这些桌面应用程序。<p>可靠性的关键在于创建高度具体且经过深思熟虑的提示。就像使用ChatGPT一样,模糊或模棱两可的提示不会得到你想要的结果。对于计算机使用来说尤其如此,因为模型正在处理几乎整个桌面屏幕的额外视觉信息;如果没有精确的指令,它就不知道要关注哪些细节或如何行动。<p>与RPA不同,Cyberdesk的代理不会盲目地重放点击。它们会在每次操作之前读取屏幕状态,并在流程发生偏差时进行自我纠正(弹出窗口、延迟、UI更改)。与现成的计算机使用AI不同,Cyberdesk在生产环境中以确定性的方式运行:代理主要遵循它已经学习的步骤,并且仅在出现异常时才退回到推理。Cyberdesk从自然语言指令中学习工作流程,捕捉细微差别并处理动态任务——远远超出了对几次运行的简单屏幕录制所能编码的范围。<p>这种方法对可靠性和成本都有好处:可靠性,因为我们在意外情况下会退回到计算机使用模型;成本,因为计算机使用模型很昂贵,我们只在需要时才使用它们。否则,我们利用更快、更实惠的视觉LLM来逐步检查屏幕状态,在确定性运行期间。我们的代理还配备了故障保护、数据提取、屏幕评估等工具,以处理动态和敏感的情况。<p>工作原理:你在任何Windows机器上安装我们的开源驱动程序 (<a href="https:&#x2F;&#x2F;github.com&#x2F;cyberdeyyoyoubackhackersk-hq&#x2F;cyberdriver" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cyberdeyyoyoubackhackersk-hq&#x2F;cyberdriver</a>)。它与我们的后端通信以接收命令(点击、输入、滚动、截图),并发送回数据(截图、API响应等)。你为我们的计算机使用代理提供一个给定任务的详细的自然语言描述,就像为第一次学习新任务的员工提供的标准操作程序一样。然后,代理利用计算机使用AI模型来学习步骤,并通过保存每个截图及其操作(点击这些坐标、输入XYZ、等待页面加载等)来记住它们。<p>代理以确定性的方式运行这些步骤,以实现快速和可预测的运行。为了应对弹出窗口和UI更改,我们的代理会将实时屏幕状态与记忆状态进行比较,以确定是否可以安全地执行记忆步骤。如果没有重大更改阻止安全执行记忆步骤,它将继续进行;否则,它将退回到计算机使用模型,并提供关于过去操作和剩余任务的上下文。<p>客户目前正在使用我们来执行手动任务,例如从遗留桌面应用程序导入和导出文件、在桌面PMS上为患者预约,以及数据录入,用于填写EMR中的表格,如患者资料等。<p>我们还没有自助服务选项,但我们很乐意手动为你提供服务。在这里预约演示以了解更多信息!(<a href="https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;</a>) 如果你更愿意等待稍后推出的自助服务选项,请在此处提交你的电子邮件 (<a href="https:&#x2F;&#x2F;forms.gle&#x2F;HfQLxMXKcv9Eh8Gs8" rel="nofollow">https:&#x2F;&#x2F;forms.gle&#x2F;HfQLxMXKcv9Eh8Gs8</a>),以便在准备就绪时收到通知。 你也可以在这里查看我们的文档:<a href="https:&#x2F;&#x2F;docs.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;docs.cyberdesk.io&#x2F;</a>。<p>我们非常乐意听取你对我们方法和遗留行业桌面自动化的看法!
查看原文
Hi HN, We’re Mahmoud and Alan, building Cyberdesk (<a href="https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;</a>), a deterministic computer use agent for automating Windows desktop applications. Developers use us to automate repetitive tasks in legacy software in healthcare, accounting, construction, and more, by executing clicks and keystrokes directly into the desktop.<p>Here’s a couple demos of Cyberdesk’s computer use agent:<p>Completing a lightning fast file import automation into a legacy desktop app: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;H_lRzrCCN0E" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;H_lRzrCCN0E</a><p>Working on a monster of a Windows monolith called OpenDental (showcases agent learning process as well): <a href="https:&#x2F;&#x2F;youtu.be&#x2F;nXiJDebOJD0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;nXiJDebOJD0</a>.<p>Filing a W-2 tax form: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;6VNEzHdc8mc" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;6VNEzHdc8mc</a><p>Many industries are stuck with legacy Windows desktop applications, with staff plagued by repetitive tasks that are incredibly time consuming. Vendors offering automations for these end up writing brittle Robotic Process Automation (RPA) scripts or hiring off-shore teams for manual task execution. RPA often breaks due to inevitable UI changes or unexpected popups like a Windows update or a random in-app notification. Off-shore teams are often unreliable and costlier than software, plus they’re not always an option for regulated industries.<p>I previously built RPA scripts impacting 20K+ employees at a Fortune 100 company where I experienced first hand RPA’s brittleness and inflexibility. It was obvious to me that this was a bandaid solution to an unsolved problem. Alan was building a computer use agent for his previous startup and realized its huge potential to automate a ton of manual computer tasks across many industries, so we started working on Cyberdesk.<p>Computer use models can struggle with abstract, long-horizon tasks, but they excel at making context-aware decisions on a screen-by-screen basis, so they’re a good fit for automating these desktop apps.<p>The key to reliability is crafting prompts that are highly specific and well thought out. Much like with ChatGPT, vague or ambiguous prompts won’t get you the results you want. This is especially true in computer use because the model is processing nearly an entire desktop screen’s worth of extra visual information; without precise instructions, it doesn’t know which details to focus on or how to act.<p>Unlike RPA, Cyberdesk’s agents don’t blindly replay clicks. They read the screen state before every action and self-correct when flows drift (pop-ups, latency, UI changes). Unlike off-the-shelf computer use AIs, Cyberdesk runs deterministically in production: the agent primarily follows the steps it has learned and only falls back to reasoning when anomalies occur. Cyberdesk learns workflows from natural-language instructions, capturing nuance and handling dynamic tasks - far beyond what a simple screen recording of a few runs can encode.<p>This approach is good for both reliability and cost: reliability, because we fall back to a computer use model in unexpected situations; and cost because the computer use models are expensive and we only use them when we need to. Otherwise we leverage faster, more affordable visual LLMs for checking the screen state step-by-step during deterministic runs. Our agents are also equipped with tools like failsafes, data extraction, screen evaluation to handle dynamic and sensitive situations.<p>How it works: you install our open source driver on any Windows machine (<a href="https:&#x2F;&#x2F;github.com&#x2F;cyberdeyyoyoubackhackersk-hq&#x2F;cyberdriver" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cyberdeyyoyoubackhackersk-hq&#x2F;cyberdriver</a>). It communicates with our backend to receive commands (click, type, scroll, screenshot) and sends back data (screenshots, API responses, etc). You give our computer use agent a detailed natural language description of the process for a given task, just like an SOP for an employee learning a new task for the first time. The agent then leverages computer use AI models to learn the steps and memorizes them by saving each screenshot alongside its action (click on these coordinates, type XYZ, wait for page to load, etc).<p>The agent deterministically runs through these steps to run fast and predictably. In order to account for popups and UI changes, our agent checks the live screen state against the memorized state to determine whether it’s safe to proceed with the memorized step. If no major changes prevent safe execution of the memorized step, it proceeds; otherwise, it falls back to a computer use model with context on past actions and the remaining task.<p>Customers are currently using us for manual tasks like importing and exporting files from legacy desktop applications, booking appointments for patients on a desktop PMS, and data entry for filling our forms like patient profiles and such in an EMR.<p>We don&#x27;t have a self-serve option yet but we&#x27;d love to onboard you manually. Book a demo here to learn more! (<a href="https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;www.cyberdesk.io&#x2F;</a>) If you’d rather wait for the self-serve option a little later down the line, please do submit your email here (<a href="https:&#x2F;&#x2F;forms.gle&#x2F;HfQLxMXKcv9Eh8Gs8" rel="nofollow">https:&#x2F;&#x2F;forms.gle&#x2F;HfQLxMXKcv9Eh8Gs8</a>) so you can be notified as soon as that’s ready. You can also check out our docs here: <a href="https:&#x2F;&#x2F;docs.cyberdesk.io&#x2F;">https:&#x2F;&#x2F;docs.cyberdesk.io&#x2F;</a>.<p>We’d absolutely love to hear your thoughts on our approach and on desktop automation for legacy industries!