HackerNews中文版

花更长时间回答 2 + 2 更令人印象深刻吗？当然不是。花的时间越长，我们就会认为那个人越不聪明。但不知为何，对于人工智能来说，花更长时间却受到了赞扬，这种说法是“保持长时间的注意力？”难道我们集体因为新冠病毒而智商下降了吗？对于一个受限于上下文窗口的工具来说，时间维度有什么意义呢？无论你是在 1 秒钟还是 60 分钟内填满窗口，结果都一样。而且，这很容易被操纵。插入随机延迟，降低每秒处理的 token 数量，你就能得到一个能够保持“长时间注意力”的模型了。也许更重要的是，这个领域的人怎么会如此轻易地相信这些容易被操纵的非指标？他们怎么没有本能地立即指出像代码行数、消耗的 token 数量或处理任务所花费的时间之类的指标是无稽之谈呢？他们是如何对他们的代码进行基准测试的？运行时间越长越好？消耗的 CPU 周期越多越好？

查看原文

Is it more impressive to take longer to answer 2 + 2? It’s not. The longer one takes, the less intelligent we would rate that person.Somehow for AI agents taking longer is getting praise with the framing “maintaining attention for long-time horizons?”Have we collectively gone down to room temperature IQs with COVID?Why would the time dimension matter for a tool that is limited in context window? Doesn’t matter if you fill up the window in 1 second or 60 minutes. Also, it’s super easy to game. Insert random lags, reduce tokens/sec, there you have a model that maintains attention over “long-time horizons”Maybe more importantly how do people in this field buy into these easily game-able non-indicators so easily? How did they not develop the instinct to instantly call out metrics like lines of code, number of tokens burned or time taken to process a task as BS the instant they hear it?How do they benchmark their code? The longer running the better? Number of CPU cycles spent?

提问 HN：我们为什么需要关注“延展的时间范围”和大型语言模型？