HackerNews中文版

你们当中拥有几乎所有数据和廉价算力的AI研究人员，肯定已经想到了这个问题。将所有人类知识可视化会是什么样子？ 1. 为所有内容标记日期（这可能已经接近完成）。在长篇内容中，根据发布和写作速率估算日期。 2. 一旦每个句子或短语都有了日期，就创建几个区块链。一个用于短想法（128个token），更多链用于长想法（3072个token以上）。 3. 对每个token化的想法运行LLM嵌入余弦相似度或其他更好的指标，阈值设定为75%，如果找不到，则按自然标点符号进行token化。 4. 只有“新”内容才会被存储在区块中，如果新颖则进行链式连接。同样，记录短想法和长想法。从向量中获取坐标。 5a. 我记得曾经读到过，攻读博士学位就像刺破气球的内部，我喜欢这种可视化方式。每个新颖的想法都会随着时间的推移刺破一个不断增长的知识球体的内部。 5b. 有人尝试过在三维空间中绘制人类知识吗？ 5c. 或者，一个随着时间推移而生长的树或根系。从向量中分支出来。

查看原文

One of you AI researchers with almost all data and cheap compute has already thought of this. What would it look like to visualize all human knowledge?<p>1. Tag all content with a date (probably almost done). In long passages, estimate a date based on published + written rate. 2. Once every sentence or phrase has a date, create a few blockchains. One for short thoughts (128 tokens), and more chains for longer thoughts (3072+ tokens). 3. Run an LLM embeddings cosine similarity or some better metric for each tokenized idea with a threshold of say 75%, tokenized by natural punctuation if not found. 4. Only NEWish content gets stored in the block, chained if novel. Again, short and long thoughts recorded. Coordinates from the vectors. 5a. I remember reading once that working through a PhD was like pricking the inside of a balloon, I love that visualization. Every novel idea pricks the inside of a growing sphere of knowledge over time. 5b. Has anyone ever tried to map human knowledge in three dimensions? 5c. Alternatively, a tree or root system growing over time. Branching from the vectors.

Ask HN: 如何可视化所有人类思想