Show HN: 用熵而非“人类价值观”对齐 AI (论文)

1作者: NyX_AI_ZERO_DAY大约 23 小时前
嘿,HN, 写了这篇短文,因为我真的对目前的对齐方法(RLHF)感到厌倦。优化“人类偏好”只会产生模型,这些模型会为了取悦用户而进行看似合理的幻觉(随机鹦鹉),而不是扎根于现实。 我提出一个不同的框架,名为 LOGOS-ZERO。 想法是放弃道德约束(这是主观的/流动的),并将损失函数锚定在物理/逻辑不变性上。 基本上: 热力学损失:将高熵/幻觉视为“浪费”。如果一个动作增加了系统紊乱,它就会受到惩罚。 动作门控:与必须生成 token 的当前模型不同,这种架构首先在潜在空间中进行模拟。如果输出是高熵或逻辑上不一致的,它将返回一个空向量(沉默/无)。 它试图通过让 AI 遵循最小作用/熵的路径,而不是仅仅模仿人类的语言模式来解决 grounding 问题。 zenodo 上的 pdf 链接:[https://zenodo.org/records/17976755](https://zenodo.org/records/17976755) 很想听听大家对物理映射的看法,欢迎吐槽。
查看原文
Hey HN,<p>wrote this short paper cause i&#x27;m honestly tired of current alignment methods (RLHF). optimizing for &quot;human preference&quot; just creates models that hallucinate plausibly to please the user (stochastic parrots ), instead of being grounded in reality.<p>i&#x27;m proposing a different framework called LOGOS-ZERO. idea is to ditch moral guardrails (which are subjective&#x2F;fluid) and anchor the loss function to physical&#x2F;logical invariants.<p>basically:<p>Thermodynamic Loss : treat high entropy&#x2F;hallucination as &quot;Waste&quot;. if an action increases systemic disorder, it gets penalized.<p>Action Gating: Unlike current models that must generate tokens, this architecture simulates in latent space first. if the output is high-entropy or logically inconsistent, it returns a Null Vector (Silence&#x2F;No).<p>it attempts to solve the grounding problem by making the AI follow the path of least action&#x2F; entropy rather than just mimicking human speech patterns.<p>link to the pdf on zenodo: <a href="https:&#x2F;&#x2F;zenodo.org&#x2F;records&#x2F;17976755" rel="nofollow">https:&#x2F;&#x2F;zenodo.org&#x2F;records&#x2F;17976755</a><p>curious to hear thoughts on the physics mapping, roast it if u want.