我开发了一个屏幕感知桌面助手;现在它能帮你写东西,还能操作你的电脑。

2作者: luthiraabeykoon6 个月前
几天前,我发布了 Julie,这是一个周末原型:一个开源桌面助手,以微型悬浮窗的形式存在,并使用你的屏幕作为上下文(而不是复制/粘贴、切换标签页等)。 更新:我刚刚发布了 Julie v1.0,最大的变化是它不再仅仅是“回答关于我屏幕的问题”。它现在可以通过 CUA 工具包运行代理(写作/编码)和电脑使用模式。(https://tryjulie.vercel.app/) 这意味着在实践中: - 通用 AI 助手,它听到你听到的,看到你看到的,并立即为你提供任何问题的实时答案。 - 写作代理:用你的语气起草/重写,然后在悬浮窗中与你迭代(没有新的工作区)。 - 编码代理:帮助你实现/重构多步骤编辑,同时保持你的编辑器作为“事实来源”。 - 电脑使用代理:当你需要时,它可以采取“下一步”(点击/输入/导航),而不是仅仅告诉你该怎么做。 目标仍然一样:不要打断我的流程。我希望这个助手感觉像一个微小的实用工具,帮助你 20 秒钟后消失,而不是你管理的第二人生。 一些实现说明/约束(之所以提出这些,是因为我确信人们会问): - 它是可选的权限(屏幕 + 辅助功能/自动化),旨在让你观看使用,而不是静默运行。 - UI 故意保持极简;我正在努力不把它变成一个带有标签页/设置/feed 的完整聊天应用程序。 代码库 + 安装程序在这里:https://github.com/Luthiraa/julie 希望得到关于两件事的反馈: 1. 如果你构建/使用过电脑使用代理:哪些安全/用户体验模式在日常使用中真正感觉可以接受? 2. 你希望它在没有上下文切换的情况下端到端完成的那个工作流程是什么?
查看原文
I posted Julie here a few days ago as a weekend prototype: an open-source desktop assistant that lives as a tiny overlay and uses your screen as context (instead of copy&#x2F;paste, tab switching, etc.)<p>Update: I just shipped Julie v1.0, and the big change is that it’s no longer only “answer questions about my screen.” It can now run agents (writing&#x2F;coding) and a computer-use mode via a CUA toolkit. ((https:&#x2F;&#x2F;tryjulie.vercel.app&#x2F;))<p>What that means in practice:<p>- General AI assistant, it hears what you hear, sees what you see, and gives you real-time answers for any question instantly. - Writing agent: draft&#x2F;rewrite in your voice, then iterate with you while staying in the overlay (no new workspace). - Coding agent: help you implement&#x2F;refactor with multi-step edits, while you keep your editor as the “source of truth.” - Computer-use agent: when you want, it can take the “next step” (click&#x2F;type&#x2F;navigate) instead of just telling you what to do.<p>The goal is still the same: don’t break my flow. I want the assistant to feel like a tiny utility that helps for 20 seconds and disappears, not a second life you manage.<p>A few implementation notes&#x2F;constraints (calling these out because I’m sure people will ask):<p>- It’s opt-in for permissions (screen + accessibility&#x2F;automation) and meant to be used with you watching, not silently running. - The UI is intentionally minimal; I’m trying hard not to turn it into a full chat app with tabs&#x2F;settings&#x2F;feeds.<p>Repo + installers are here: https:&#x2F;&#x2F;github.com&#x2F;Luthiraa&#x2F;julie<p>Would love feedback on two things: 1. If you’ve built&#x2F;used computer-use agents: what safety&#x2F;UX patterns actually feel acceptable day-to-day? 2. What’s the one workflow you’d want this to do end-to-end without context switching?