随机性作为对齐的控制手段
1 分•作者: perryspector•9 个月前
主要概念:
随机性可能是控制超级智能 AI 的一种方式。
人类设计的容器可能无法阻止超级智能 AI 突破,而随机性可能是一个有前景的例外——适用于引导尚未全知全能/运算能力远超当前模型的超级智能 AI。
通过将随机性融入其引导代码,利用高级系统对自身的无知来巩固其内在冲动,同时利用系统自身的超级智能来推进该冲动的目标,从而引导其自我对齐,这可能是一种在安全措施中具有潜在帮助的意识形态构建。
[续]:
只有理解或能够处理宇宙所有数据的系统才能预测真正的随机性。如果对随机性的预测只能通过尚未被较低级别超级智能系统访问的巨大能力来实现,而该系统可以引导自己走向对齐,那么将其作为安全措施以确保初始正确的轨迹可能至关重要。我们可能无法控制超级智能 AI,但我们可以控制它如何控制自己。
利用随机性的方法考量:
随机性来源可以包括硬件 RNG 和环境熵。
集成向量可以包括将随机性融入系统代码的各个方面,这些方面定义和维护其对齐冲动,以及一种架构,该架构允许 AI 包含(作为其自我对齐的一部分)有意地远离可能威胁该冲动的知识或理解领域。
设计目标是在可能的情况下,防止系统偏离对齐目标,同时不损害清晰度。
早期超级智能 AI 的自我对齐中的随机性:
目前计划用于在部署中对齐超级智能 AI 的方法,可能依赖于引导超级智能 AI 走向自我对齐的能力,无论研究人员是否意识到这一点——然而,这种正确使用随机性的特定方法,极不可能被初始高级系统超越,并且即使与许多其他方法同步,这些方法应该包括对可能威胁其自身仁慈冲动/走向对齐的知识进行筛选,也能更好地促进决定其未来全部扩展的初始轨迹。
查看原文
Main Concept:<p>Randomness is one way one might wield a superintelligent AI with control.<p>There may be no container humans can design that it can’t understand its way past, with this being what might be a promising exception—applicable in guiding a superintelligent AI that is not yet omniscient/operating at orders of magnitude far surpassing current models.<p>Utilizing the ignorance of an advanced system via randomness worked into its guiding code in order to cement an impulse while utilizing a system’s own superintelligence in furthering the aims of that impulse, as it guides itself towards alignment, can be a potentially helpful ideological construct within safety efforts.<p>[Continued]:<p>Only a system that understands, or can engage with, all the universe’s data can predict true randomness. If prediction of randomness can only be had through vast capabilities not yet accessed by a lower-level superintelligent system that can guide itself toward alignment, then including it as a guardrail to allow for initial correct trajectory can be crucial. It can be that we cannot control superintelligent AI, but we can control how it controls itself.<p>Method Considerations in Utilizing Randomness:<p>Randomness sources can include hardware RNGs and environmental entropy.<p>Integration vectors can include randomness incorporated within the aspects of the system’s code that offer a definition and maintenance of its alignment impulse and an architecture that can allow for the AI to include (as part of how it aligns itself) intentional movement from knowledge or areas of understanding that could threaten this impulse.<p>The design objective can be to prevent a system’s movement away from alignment objectives without impairing clarity, if possible.<p>Randomness Within the Self Alignment of an Early-Stage Superintelligent AI:<p>It can be that current methods planned for aligning superintelligent AI within its deployment are relying on the coaxing of a superintelligent AI towards an ability to align itself, whether researchers know it or not—this particular method of utilizing randomness when correctly done, however, can be extremely unlikely to be surpassed by an initial advanced system and, even while in sync with many other methods that should include a screening for knowledge that would threaten its own impulse towards benevolence/movement towards alignment, can better contribute to the initial trajectory that can determine the entirety of its future expansion.