提问 HN:你如何搜索你的个人数据?
1 分•作者: escapecharacter•7 个月前
我这里有近 20 年的工作笔记、通信、代码和文档。它们分散在多个(云)服务上,跨越这些“领地”进行搜索变得非常不切实际。
问题是这样的:“啊,我记得和某人讨论过 [算法],然后记录了一个重要的见解。让我们找到它。”
这并不是一个可以通过 LLM 解决的问题。阻碍在于没有办法在所有这些纯文本上运行搜索代码。
服务包括:
* 电子邮件(Gmail,通过 Apple Mail 同步到我的 macOS 磁盘)
* Dropbox
* Notion
* Google Drive
* Obsidian
* Github
* Apple Notes
* Discord 聊天记录
* Trello
* 我的个人博客
如果我将所有内容都同步到我的 Mac 磁盘,也许我可以在那里进行纯文本搜索。然而,Spotlight 的索引总是不完整,并且会遗漏明显的文件。我的 Dropbox 太大了,我没有将所有内容都本地同步。
我不再使用一些服务,比如 Evernote。当我归档这项服务时,我导出了所有内容并将其移到了我的 Dropbox 中。所以,如果我搜索 Dropbox,它也会搜索来自 Evernote 的旧笔记。我不可能对所有我积极使用的服务都这样做。
我现在搜索的方式是猜测结果最有可能出现在哪个服务中,然后在那里搜索。当找不到结果时,我就会搜索下一个最有可能的服务,如此反复。
对于我的个人博客,我过去使用 Google 的网站搜索,但我最近发现它是不完整的:https://bsky.app/profile/dustinfreeman.bsky.social/post/3m5l5tto6pk27
我可以想象一个解决方案,即某个第三方服务拥有访问我所有服务的密钥。但说实话,这需要极大的信任。而且,我对所有这些服务的访问都启用了双因素身份验证,并且会过期,因此我需要不断地重新授权给这个第三方服务。到那时,直接像我现在这样搜索就更有意义了。
查看原文
I have personal notes, correspondence, code and documentation from nearly 20 years of work. These are spread across multiple (cloud) services, and searching across these fiefdoms has been impractical.<p>The problem goes like: "Ah, I remember having a conversation with someone about [algorithm], then recording an important insight. Let's find that."<p>This isn't a problem solved by an LLM. The blocker is that there isn't a way to run search code on all this plain text.<p>Services:
* Email (gmail, synced to my macOS disk with Apple Mail)
* Dropbox
* Notion
* Google Drive
* Obsidian
* Github
* Apple Notes
* Discord chats
* Trello
* My own blog<p>If I had everything synced to my mac's disk, maybe I could do a plaintext search there. However Spotlight's indexing is always incomplete and misses obvious files. My Dropbox is so large I don't sync it all locally.<p>Some services I no longer use, like Evernote. When I archived this service, I exported everything and moved it into my Dropbox. So, if I search Dropbox, it also searches old notes from Evernote. There's no way I could be doing this for all services I actively use.<p>The way I search now is I guess the service the result is most likely in, and search there. When finding no results, I search the next most likely service, ad nauseum
For my own blog, I used to use Google's site search, but I recently discovered this was incomplete: https://bsky.app/profile/dustinfreeman.bsky.social/post/3m5l5tto6pk27<p>I could imagine a solution where there's some 3rd party service that has access keys to all my services. But, let's be real, that's a huge amount of trust. Also, my access to all these services is 2FA'd with expiry, and so I'd be continually re-upping auth to this third party service. At that point, it makes sense to just do search how I do it now.