我开发了一个一键式内联 AI 重写工具(以及它遇到的问题)
1 分•作者: AzeniqTech•6 个月前
我一直在“dogfooding”(自用)我构建的一个小型写作助手,名为 Rephrazo,我想分享一些到目前为止的实现细节和错误。<p>这个想法很简单:<p>* 突出显示你正在写作的文本
* 按下热键
* 在一个小弹窗中获得 AI 释义
* 一键插入<p>目标是消除小的编辑操作中“复制 - 打开 AI 工具 - 粘贴 - 重写 - 粘贴回”的循环。<p>这篇文章是关于我如何将其连接起来的,哪些技术上有效,哪些无效。<p>### 我设计时考虑的约束<p>从一开始,我就尝试在几个约束下进行设计:<p>* 一个热键 → 一个主要动作
* 停留在当前应用程序内(没有浏览器,没有大的侧面板)
* 最小的 UI:单个建议,一键插入
* 延迟“感觉是即时的”,否则就不会被使用<p>每当我打破这些约束(添加额外的选择、提示等),在“dogfooding”中的使用率就会下降。<p>### 整体架构<p>大致分解:<p>* 桌面客户端,它:<p><pre><code> * 监听全局热键
* 抓取当前文本选择
* 将其发送到 API
* 在选择附近的一个小覆盖层中显示返回的释义</code></pre>
* 后端 API,它:<p><pre><code> * 接受选定的文本 + 一些最小的上下文
* 调用 LLM
* 应用一个固定的提示(“让它更清晰,尽可能保持语气/风格”)
* 返回单个建议(目前没有多选)
</code></pre>
目前还没有花哨的基础设施,只是想尽可能缩短从“按键”到“返回文本”的路径。<p>### 文本捕获和插入<p>令人惊讶的棘手部分不是 LLM,而是:<p>* 可靠地捕获选定的文本
* 不搞乱用户的剪贴板
* 插入重写的文本,不破坏格式<p>第一个版本实际上滥用了剪贴板:<p>* 保存剪贴板
* 复制选择
* 发送到后端
* 通过粘贴结果替换选择
* 恢复剪贴板<p>这奏效了……直到它失效:<p>* 一些应用程序忽略模拟的按键
* 有时剪贴板会被其他东西覆盖
* 感觉很脆弱,很“黑客”<p>我正在慢慢转向更多应用程序感知的集成(如果可能),同时仍然保留一个通用的后备方案。<p>### 延迟和用户体验<p>延迟比我预想的更重要。大致分为几类:<p>* < 500 毫秒 → 感觉是即时的,人们很满意
* 1–2 秒 → 如果建议明显更好,则可以接受
* > 3 秒 → 人们会后悔按下热键,并且使用频率会降低<p>一些小的用户体验改进有所帮助:<p>* 在选择附近立即显示一个小的“加载”状态
* 立即渲染弹窗(骨架状态),然后在响应到达时填充它
* 失败时,显示一条简短、诚实的消息,而不是默默地什么都不做<p>如果你正在构建 AI 工具,这不会让你感到惊讶,但当你看到自己的用户在几次缓慢的响应后犹豫不决时,感觉就不同了。<p>### 出错的地方<p>* 我早期过度构建了自定义功能:<p><pre><code> * 语气下拉菜单
* 多种模式(“更短”、“更长”、“更正式”)
* 额外的切换开关
人们忽略了它们,或者产生了决策疲劳。
</code></pre>
* 我低估了在不同应用程序中选择/插入的边缘情况的数量。<p>* 我在最初的版本中没有记录足够的日志,所以我不得不改造遥测技术来了解实际使用情况。<p>如果你有兴趣,目前的早期版本在这里:
[https://rephrazo-ai.app/](https://rephrazo-ai.app/)
查看原文
I’ve been dogfooding a small writing helper I built called Rephrazo, and I thought it might be useful to share some implementation details and mistakes so far.<p>The idea is simple:<p>* highlight text where you’re writing
* press a hotkey
* get an AI paraphrase in a small popup
* insert it back with one click<p>The goal is to remove the “copy - open AI tool - paste - rewrite - paste back” loop for small edits.<p>This post is about how I wired it up, what worked technically, and what didn’t.<p>### Constraints I designed for<p>From the beginning I tried to design under a few constraints:<p>* One hotkey → one main action
* Stay inside the current app (no browser, no big side panel)
* Minimal UI: single suggestion, one click to insert
* Latency “feels instant” or it doesn’t get used<p>Whenever I broke these constraints (added extra choices, prompts, etc.), usage dropped in dogfooding.<p>### High-level architecture<p>Rough breakdown:<p>* Desktop client that:<p><pre><code> * listens for a global hotkey
* grabs the current text selection
* sends it to an API
* displays the returned paraphrase in a small overlay near the selection</code></pre>
* Backend API that:<p><pre><code> * accepts the selected text + some minimal context
* calls an LLM
* applies a fixed prompt (“make this clearer, keep tone/voice as much as possible”)
* returns a single suggestion (no multi-choice for now)
</code></pre>
No fancy infra yet, just trying to keep the path from “key press” to “returned text” as short as possible.<p>### Text capture and insertion<p>The surprisingly tricky part wasn’t the LLM, it was:<p>* reliably capturing the selected text
* not messing up the user’s clipboard
* inserting the rewritten text back without breaking formatting<p>The first version literally abused the clipboard:<p>* save clipboard
* copy selection
* send to backend
* replace selection by pasting the result
* restore clipboard<p>This worked… until it didn’t:<p>* some apps ignore simulated keypresses
* sometimes the clipboard got overwritten by other things in between
* it felt fragile and “hacky”<p>I’m slowly moving toward more app-aware integrations (where possible) while still keeping a generic fallback.<p>### Latency and UX<p>Latency matters more than I expected. Rough buckets:<p>* < 500 ms → feels instant, people are happy
* 1–2 seconds → acceptable if the suggestion is clearly better
* > 3 seconds → people regret pressing the hotkey and use it less<p>A few tiny UX things helped:<p>* show a small “loading” state immediately near the selection
* render the popup instantly (skeleton state), then fill it when the response arrives
* on failure, show a short, honest message instead of silently doing nothing<p>If you’re building AI tools, this won’t surprise you, but it’s different when you watch your own users hesitate after a few slow responses.<p>### Things that went wrong<p>* I overbuilt customization early:<p><pre><code> * tone dropdowns
* multiple modes (“shorter”, “longer”, “more formal”)
* extra toggles
People ignored them, or got decision fatigue.
</code></pre>
* I underestimated how many edge cases there are with selection/insertion across different apps.<p>* I didn’t log enough in the first builds, so I had to retrofit telemetry to understand actual usage.<p>If you’re curious, the current early version is here:
[https://rephrazo-ai.app/](https://rephrazo-ai.app/)