提问 HN:我们为什么需要关注“延展的时间范围”和大型语言模型?

2作者: ozozozd2 天前
花更长时间回答 2 + 2 更令人印象深刻吗?当然不是。花的时间越长,我们就会认为那个人越不聪明。<p>但不知为何,对于人工智能来说,花更长时间却受到了赞扬,这种说法是“保持长时间的注意力?”<p>难道我们集体因为新冠病毒而智商下降了吗?<p>对于一个受限于上下文窗口的工具来说,时间维度有什么意义呢?无论你是在 1 秒钟还是 60 分钟内填满窗口,结果都一样。而且,这很容易被操纵。插入随机延迟,降低每秒处理的 token 数量,你就能得到一个能够保持“长时间注意力”的模型了。<p>也许更重要的是,这个领域的人怎么会如此轻易地相信这些容易被操纵的非指标?他们怎么没有本能地立即指出像代码行数、消耗的 token 数量或处理任务所花费的时间之类的指标是无稽之谈呢?<p>他们是如何对他们的代码进行基准测试的?运行时间越长越好?消耗的 CPU 周期越多越好?
查看原文
Is it more impressive to take longer to answer 2 + 2? It’s not. The longer one takes, the less intelligent we would rate that person.<p>Somehow for AI agents taking longer is getting praise with the framing “maintaining attention for long-time horizons?”<p>Have we collectively gone down to room temperature IQs with COVID?<p>Why would the time dimension matter for a tool that is limited in context window? Doesn’t matter if you fill up the window in 1 second or 60 minutes. Also, it’s super easy to game. Insert random lags, reduce tokens&#x2F;sec, there you have a model that maintains attention over “long-time horizons”<p>Maybe more importantly how do people in this field buy into these easily game-able non-indicators so easily? How did they not develop the instinct to instantly call out metrics like lines of code, number of tokens burned or time taken to process a task as BS the instant they hear it?<p>How do they benchmark their code? The longer running the better? Number of CPU cycles spent?