Show HN: 用熵而非“人类价值观”对齐 AI (论文)
1 分•作者: NyX_AI_ZERO_DAY•大约 23 小时前
嘿,HN,
写了这篇短文,因为我真的对目前的对齐方法(RLHF)感到厌倦。优化“人类偏好”只会产生模型,这些模型会为了取悦用户而进行看似合理的幻觉(随机鹦鹉),而不是扎根于现实。
我提出一个不同的框架,名为 LOGOS-ZERO。 想法是放弃道德约束(这是主观的/流动的),并将损失函数锚定在物理/逻辑不变性上。
基本上:
热力学损失:将高熵/幻觉视为“浪费”。如果一个动作增加了系统紊乱,它就会受到惩罚。
动作门控:与必须生成 token 的当前模型不同,这种架构首先在潜在空间中进行模拟。如果输出是高熵或逻辑上不一致的,它将返回一个空向量(沉默/无)。
它试图通过让 AI 遵循最小作用/熵的路径,而不是仅仅模仿人类的语言模式来解决 grounding 问题。
zenodo 上的 pdf 链接:[https://zenodo.org/records/17976755](https://zenodo.org/records/17976755)
很想听听大家对物理映射的看法,欢迎吐槽。
查看原文
Hey HN,<p>wrote this short paper cause i'm honestly tired of current alignment methods (RLHF). optimizing for "human preference" just creates models that hallucinate plausibly to please the user (stochastic parrots ), instead of being grounded in reality.<p>i'm proposing a different framework called LOGOS-ZERO. idea is to ditch moral guardrails (which are subjective/fluid) and anchor the loss function to physical/logical invariants.<p>basically:<p>Thermodynamic Loss : treat high entropy/hallucination as "Waste". if an action increases systemic disorder, it gets penalized.<p>Action Gating: Unlike current models that must generate tokens, this architecture simulates in latent space first. if the output is high-entropy or logically inconsistent, it returns a Null Vector (Silence/No).<p>it attempts to solve the grounding problem by making the AI follow the path of least action/ entropy rather than just mimicking human speech patterns.<p>link to the pdf on zenodo: <a href="https://zenodo.org/records/17976755" rel="nofollow">https://zenodo.org/records/17976755</a><p>curious to hear thoughts on the physics mapping, roast it if u want.