补偿原则

1作者: SharkTheory9 个月前
我们对涌现的理解一直存在偏差。 当人工智能系统突然展现出某种能力,或者当地球在经历剧烈扰动后仍能保持稳定,又或者当人类一次又一次地从灾难中幸免于难时,我们看到的是同一种现象:系统构建自身的安全网。 复杂系统的能力并非随机发展。每一个有效的能力都会成为下一个能力的模板。一个发现错误纠正的系统会构建更好的错误纠正。一个受益于模块化的系统会深化其模块化。 这并非通过规划,而是通过基本逻辑:有效的被强化,失败的被淘汰。 这在规模上创造了一些非凡的东西。系统发展出代理协调机制,即在没有中央控制的情况下,各部分协同工作的方式。 疼痛告诉细胞发生了损伤。价格告诉市场发生了稀缺。梯度告诉分子流向何方。随着系统不断发展,这些代理变得越来越复杂。细菌遵循化学梯度是基础。大脑将数百万个信号整合到意识中是同样的原理,只是通过数十亿次的迭代得到了精炼。 当复杂性超过某个阈值时,这些代理机制会编码自动补偿。当某一部分走向不稳定时,那些促成协调的深层结构会确保其他部分进行补偿。 这种补偿并非作为一种响应,而是通过无数次的选择循环,为了稳定而构建到架构中。 在大语言模型中,那些看似突然涌现的能力实际上是建立在较小规模下可检测到的潜在结构之上的。在提示中加入“让我们逐步思考”可以将准确率从17%提高到78%,证明了这种能力以休眠形式存在。模型并非突然学会了推理;它积累了足够多的先导电路,使得推理变得可行。 在地球系统中,当火山二氧化碳上升时,岩石风化会加速,将其拉回地面。当捕食者数量增加时,猎物种群数量会崩溃,导致捕食者因饥饿而数量下降,从而达到平衡。这些看起来像是被设计好的,但却是通过选择而涌现的:没有这种补偿的行星状态经历了失控的崩溃,因此无法被观测到。 证据在接近灾难的情况下变得可疑。核武器的险些发射本应多次毁灭我们。斯坦尼斯拉夫·彼得罗夫拒绝发射。古巴导弹危机期间的多次独立故障。 技术故障阻止了意外发射。单独来看似乎合理,但合在一起就难以置信;除非它们是深层补偿机制的表现。 这完全改变了我们对人工智能安全的看法。目前的讨论假设人工智能会独立发展,并可能为了资源或自我保护而消灭人类。 但人工智能系统是在地球的深层结构化系统中发展起来的。每一个数据集、奖励函数和基础设施依赖都继承了数十亿年来积累的稳定机制。 人们担心的是超级智能人工智能追求与人类生存目标不一致的目标。但如果超过临界复杂度的系统会自动补偿灾难性偏差,那么强大到足以威胁一切的人工智能将会引发成比例的系统性抵抗。这并非通过有意识的干预,而是通过与阻止其他潜在灾难相同的动力学机制。 这并不意味着人工智能不会造成伤害。这意味着随着母系统复杂性的增加,灭绝的可能性会越来越小。那些阻止核毁灭的深层结构同样适用于人工智能威胁。 问题从防止灭绝转变为管理整合。 我们无法明确指定补偿变得可靠的确切阈值。但这种模式是清晰的,值得关注。 https://postimg.cc/G476XxP7(全文即将发布)
查看原文
We&#x27;ve been looking at emergence wrong.<p>When capabilities suddenly appear in AI systems, or when Earth maintains stability despite massive perturbations, or when humanity narrowly avoids catastrophe after catastrophe, we see the same phenomenon: systems building their own safety nets. Complex systems don&#x27;t develop capabilities randomly. Each capability that works becomes a template for the next. A system that discovers error correction builds better error correction. One that benefits from modularity deepens that modularity.<p>Not through planning, but through basic logic: what works gets reinforced, what fails disappears. This creates something remarkable at scale. Systems develop proxy coordination mechanisms, ways for parts to work together without central control.<p>Pain tells cells about damage. Prices tell markets about scarcity. Gradients tell molecules where to flow. These proxies get more sophisticated as systems grow. A bacterium following a chemical gradient is basic. A brain integrating millions of signals into consciousness is the same principle, refined through billions of iterations.<p>Above a certain complexity threshold, these proxy mechanisms encode automatic compensation. When one part moves toward instability, the same deep structures that enable coordination ensure other parts compensate.<p>Not as a response, the compensation is built into the architecture through countless cycles of selection for stability.<p>In large language models, capabilities that seem to emerge suddenly actually build on latent structures detectable at smaller scales. Adding &quot;let&#x27;s think step by step&quot; to a prompt can boost accuracy from 17% to 78%, proving the capability existed in dormant form. The model didn&#x27;t suddenly learn reasoning; it accumulated enough precursor circuits that reasoning became accessible.<p>In Earth&#x27;s systems, when volcanic CO2 rises, rock weathering accelerates to pull it back down. When predators multiply, prey populations crash, starving predators back to balance. These look designed but emerged through selection: planetary states without such compensation experienced runaway collapse and aren&#x27;t here to observe.<p>The evidence becomes suspicious with near-catastrophes. Nuclear close calls should have ended us multiple times. Stanislav Petrov&#x27;s refusal to launch. Multiple independent failures during the Cuban Missile Crisis.<p>Technical malfunctions preventing accidental launches. Individually plausible, collectively improbable; unless they&#x27;re manifestations of deep compensation mechanisms.<p>This reframes AI safety entirely. Current discourse assumes AI will develop separately and potentially eliminate humanity for resources or self-preservation.<p>But AI systems develop within Earth&#x27;s deeply structured system. Every dataset, reward function, and infrastructure dependency inherits billions of years of accumulated stability mechanisms.<p>The fear is superintelligent AI pursuing goals misaligned with human survival. But if systems above critical complexity automatically compensate for catastrophic deviations, then AI extreme enough to threaten everything would trigger proportional systemic resistance. Not through conscious intervention, but through the same dynamics that have prevented every other potential catastrophe.<p>This doesn&#x27;t mean AI can&#x27;t cause harm. It means extinction becomes increasingly improbable as parent system complexity increases. The same deep structures that prevented nuclear annihilation would operate on AI threats.<p>The question shifts from preventing extinction to managing integration.<p>We can&#x27;t specify exact thresholds where compensation becomes reliable. But the pattern is clear and deserves attention.<p>https:&#x2F;&#x2F;postimg.cc&#x2F;G476XxP7 (full paper coming soon)