HackerNews中文版

我一直在思考递归自我改进，特别是它近期变得重要的可能性。近期，指的是对于当前的大型语言模型（LLMs）而言，它们可能会放弃、删除测试集，或者因此脱离现实。将这种情况发生的概率称为 P。你可以通过观察研究任务以及它们需要多少人类协助才能保持在正轨上来估计 P。 “哥德尔机”（证明下一步会更好）试图将数学作为一种预言。这依赖于数学基础的真实性。我设想的其他可能提供帮助的预言包括来自未来的预言，它们可以判断某个改变是否会导致系统脱离现实，这是一种对可能导致脱离现实的改变的“通行/停止”信号。是否存在其他类别的预言？由于经典计算机的偏见，我可能没有考虑到一些复杂的量子计算。

查看原文

I've been thinking about recursive self improvement. But especially the likelihood that it will be important soon. Soon means with current LLMs that might just give up or delete the test set or otherwise detach from reality. Call the probability of that happening P. You can estimate P by looking at research tasks and how much they need to get helped along by humans to stay on task.<p>Goedel machines (that prove that the next step is better) try to use maths as an oracle. Which relies on the mathematical foundations being true. Other oracles I've theorised could help are oracles from the future that can say whether the change detaches the system from reality or not, a sort of go/no-go signal for potentially detaching changes.<p>Are there any other classes of oracles?<p>There might be some complex quantum computation that I'm not thinking of, due to classical computer bias.

预言机与递归式自我改进