提问 HN:你如何搜索你的个人数据?

1作者: escapecharacter7 个月前
我这里有近 20 年的工作笔记、通信、代码和文档。它们分散在多个(云)服务上,跨越这些“领地”进行搜索变得非常不切实际。 问题是这样的:“啊,我记得和某人讨论过 [算法],然后记录了一个重要的见解。让我们找到它。” 这并不是一个可以通过 LLM 解决的问题。阻碍在于没有办法在所有这些纯文本上运行搜索代码。 服务包括: * 电子邮件(Gmail,通过 Apple Mail 同步到我的 macOS 磁盘) * Dropbox * Notion * Google Drive * Obsidian * Github * Apple Notes * Discord 聊天记录 * Trello * 我的个人博客 如果我将所有内容都同步到我的 Mac 磁盘,也许我可以在那里进行纯文本搜索。然而,Spotlight 的索引总是不完整,并且会遗漏明显的文件。我的 Dropbox 太大了,我没有将所有内容都本地同步。 我不再使用一些服务,比如 Evernote。当我归档这项服务时,我导出了所有内容并将其移到了我的 Dropbox 中。所以,如果我搜索 Dropbox,它也会搜索来自 Evernote 的旧笔记。我不可能对所有我积极使用的服务都这样做。 我现在搜索的方式是猜测结果最有可能出现在哪个服务中,然后在那里搜索。当找不到结果时,我就会搜索下一个最有可能的服务,如此反复。 对于我的个人博客,我过去使用 Google 的网站搜索,但我最近发现它是不完整的:https://bsky.app/profile/dustinfreeman.bsky.social/post/3m5l5tto6pk27 我可以想象一个解决方案,即某个第三方服务拥有访问我所有服务的密钥。但说实话,这需要极大的信任。而且,我对所有这些服务的访问都启用了双因素身份验证,并且会过期,因此我需要不断地重新授权给这个第三方服务。到那时,直接像我现在这样搜索就更有意义了。
查看原文
I have personal notes, correspondence, code and documentation from nearly 20 years of work. These are spread across multiple (cloud) services, and searching across these fiefdoms has been impractical.<p>The problem goes like: &quot;Ah, I remember having a conversation with someone about [algorithm], then recording an important insight. Let&#x27;s find that.&quot;<p>This isn&#x27;t a problem solved by an LLM. The blocker is that there isn&#x27;t a way to run search code on all this plain text.<p>Services: * Email (gmail, synced to my macOS disk with Apple Mail) * Dropbox * Notion * Google Drive * Obsidian * Github * Apple Notes * Discord chats * Trello * My own blog<p>If I had everything synced to my mac&#x27;s disk, maybe I could do a plaintext search there. However Spotlight&#x27;s indexing is always incomplete and misses obvious files. My Dropbox is so large I don&#x27;t sync it all locally.<p>Some services I no longer use, like Evernote. When I archived this service, I exported everything and moved it into my Dropbox. So, if I search Dropbox, it also searches old notes from Evernote. There&#x27;s no way I could be doing this for all services I actively use.<p>The way I search now is I guess the service the result is most likely in, and search there. When finding no results, I search the next most likely service, ad nauseum For my own blog, I used to use Google&#x27;s site search, but I recently discovered this was incomplete: https:&#x2F;&#x2F;bsky.app&#x2F;profile&#x2F;dustinfreeman.bsky.social&#x2F;post&#x2F;3m5l5tto6pk27<p>I could imagine a solution where there&#x27;s some 3rd party service that has access keys to all my services. But, let&#x27;s be real, that&#x27;s a huge amount of trust. Also, my access to all these services is 2FA&#x27;d with expiry, and so I&#x27;d be continually re-upping auth to this third party service. At that point, it makes sense to just do search how I do it now.