Launch HN: BrowserBook (YC F24) – BrowserBook(YC F24)发布:用于确定性浏览器自动化的 IDE
14 分•作者: cschlaepfer•19 天前
嗨,HN!我们是 BrowserBook 的 Chris、Jorrie 和 Evan,BrowserBook 是一款用于编写和调试基于 Playwright 的 Web 自动化程序的 IDE。您可以在这里下载 Mac 应用程序:<a href="https://browserbook.com">https://browserbook.com</a>,演示视频在这里:<a href="https://www.youtube.com/watch?v=ODGJBCNqGUI" rel="nofollow">https://www.youtube.com/watch?v=ODGJBCNqGUI</a>。
我们构建它的原因:当我们参加 YC 时,我们是一家自动化后台医疗保健工作流程的公司。由于医疗保健领域的互操作性生态系统非常分散,我们开始使用浏览器代理通过 Web 直接自动化电子病历、实践管理软件和支付门户。当我们这样做时,我们遇到了很多问题:
速度:与脚本方法相比,LLM 调用的延迟很高
成本:我们消耗了大量 token,因为我们需要所有上下文才能使自动化程序足够准确
可靠性:即使有详细的说明、上下文和工具,代理在多步骤任务中也往往会以不可预测的方式发生偏差
可调试性:当发生偏差时,我们本质上是在提示中玩打地鼠游戏,并重新运行整个自动化程序来调试问题(参见上文:速度和成本问题使这非常痛苦)
我们越来越多地只是将代理脚本交给他们执行。最终,我们得出的结论是,对于此类用例,脚本是 Web 自动化程序更好的方法。但脚本编写也太痛苦了,所以我们着手用 BrowserBook 解决这些问题。
在底层,它运行一个独立的 TypeScript REPL,直接连接到一个内联浏览器实例,并内置了工具,使脚本开发快速而轻松。这包括:
- IDE 中直接有一个完全交互式的浏览器窗口,因此您无需切换上下文即可运行代码
- Jupyter 笔记本风格的环境——这里的想法是您可以在单独的单元格中编写自动化程序的一部分,并单独运行它们(并手动在浏览器中快速重置),而不是每次都必须重新运行整个程序
- 一个 AI 编码助手,它使用当前页面的 DOM 上下文来编写自动化逻辑,这有助于避免寻找选择器
- 用于截屏、数据提取和管理身份验证(用于需要身份验证的工作流程)的辅助函数。
创建自动化程序后,您可以在应用程序中或通过 API 在我们的托管环境中直接运行它,以便您可以在外部应用程序或代理工作流程中使用它。
BrowserBook 的核心是一个 Electron 应用程序,因此我们可以在应用程序中直接运行 Chrome 实例,而无需云托管浏览器。对于 API 运行,我们通过 Kernel 使用托管浏览器基础设施(顺便说一句,这是一个很棒的产品),依靠他们的机器人反检测功能(隐身模式、代理等)。
脚本自动化可能不受欢迎,因为脚本本质上是脆弱的;与“传统”软件开发不同,您的代码部署在您无法控制的环境中——别人的网站。使用 BrowserBook,我们试图“拥抱痛苦”,并承认这种“攻击性编程”环境。
我们从头开始设计,假设脚本会中断,并旨在提供使构建和维护它们更容易的工具。未来,我们的计划是利用 AI 在它已经展示出优势的地方——编写代码——以最大限度地减少停机时间,并在部署环境发生变化时快速修复损坏的脚本。
浏览器代理承诺通过将控制权交给可以处理不一致性和歧义的 LLM 来解决这个问题。虽然我们认为浏览器代理在某些应用中确实有帮助,但需要可靠且重复执行的任务并非如此。
我们希望您试用一下!您可以在我们的网站上下载 BrowserBook:<a href="https://browserbook.com">https://browserbook.com</a>(目前仅适用于 Mac,抱歉!)当然,我们很乐意收到您的任何反馈和评论!
查看原文
Hey HN! We’re Chris, Jorrie, and Evan of BrowserBook, an IDE for writing and debugging Playwright-based web automations. You can download it as a Mac app here: <a href="https://browserbook.com">https://browserbook.com</a>, and there’s a demo video at <a href="https://www.youtube.com/watch?v=ODGJBCNqGUI" rel="nofollow">https://www.youtube.com/watch?v=ODGJBCNqGUI</a>.<p>Why we built this: When we were going through YC, we were a company that automated back-office healthcare workflows. Since the interoperability ecosystem in healthcare is so fragmented, we started using browser agents to automate EMRs, practice management software, and payment portals directly through the web. When we did, we ran into a ton of problems:<p>Speed: High latency on LLM calls vs. a scripting approach<p>Cost: We burned through tokens with all the context we needed to make the automations reasonably accurate<p>Reliability: Even with detailed instructions, context, and tools, agents tended to drift on multi-step tasks in unpredictable ways<p>Debuggability: When drift did occur, we were essentially playing whack-a-mole in our prompt and re-running the whole automation to debug issues (see above: speed and cost issues made this quite painful)<p>More and more we were just giving our agent scripts to execute. Eventually, we came to the conclusion that scripting is a better approach for web automation for these sort of use cases. But scripting was also too painful, so we set out to solve those problems with BrowserBook.<p>Under the hood, it runs a standalone TypeScript REPL wired directly into an inline browser instance, with built-in tooling to make script development quick and easy. This includes:<p>- A fully interactive browser window directly in the IDE so you can run your code without context switching<p>- A Jupyter-notebook-style environment - the idea here is you can write portions of your automation in individual cells and run them individually (and quickly reset manually in the browser), instead of having to rerun the whole thing every time<p>- An AI coding assistant which uses the DOM context of the current page to write automation logic, which helps avoid digging around for selectors<p>- Helper functions for taking screenshots, data extraction, and managed authentication for auth-required workflows.<p>Once you’ve created your automation, you can run it directly in the application or in our hosted environment via API, so you can use it in external apps or agentic workflows.<p>At its core, BrowserBook is an Electron app, so we can run a Chrome instance directly in the app without the need for cloud-hosted browsers. For API runs, we use hosted browser infra via Kernel (which is a fantastic product, btw), relying on their bot anti-detection capabilities (stealth mode, proxies, etc.).<p>Scripted automation can be unpopular because scripts are inherently brittle; unlike “traditional” software development, your code is deployed in an environment you don’t control - someone else’s website. With BrowserBook, we’re trying to “embrace the suck”, and acknowledge this “offensive programming” environment.<p>We’ve designed from the ground up to assume scripts will break, and aim to provide the tools that make building and maintaining them easier. In the future, our plan is to leverage AI where it has shown its strength already - writing code - to minimize downtime and quickly repair broken scripts as the deployed environment changes.<p>Browser agents promised to solve this by handing the reins to an LLM which can handle inconsistency and ambiguity. While we think there are some applications where browser agents can be genuinely helpful, tasks that need to be done reliably and repeatedly are not one of them.<p>We’d love for you to try it out! You can download BrowserBook from our website here: <a href="https://browserbook.com">https://browserbook.com</a> (only available for Mac so far, sorry!) And of course, we’d appreciate any feedback and comments you have!