问 HN:AI 进展——人们主要通过哪些方式衡量它?
1 分•作者: ericlamb89•6 个月前
我对目前评估人工智能进展的方式很感兴趣,并试图建立一个人们实际使用的主要方法列表。
我知道所有这些衡量标准都有局限性,而且许多标准是有争议的,或者在设计上就是不完善的。我并不认为它们是“好的”,或者它们能清晰地映射到现实世界的能力。
我很想听听:
* 你认为哪些衡量标准、基准或方法应该在这个列表上
* 你认为它们的主要优势和失效模式是什么
* 你个人如何(或是否)使用它们来解读人工智能的进展
我在这里的目标是探索和理解,而不是为任何特定的框架辩护或攻击。
查看原文
I’m interested in how AI progress is currently evaluated and trying to build a list of the major approaches people actually use.<p>I’m aware that all of these measures have limitations and that many are controversial or imperfect by design. I’m not assuming they’re “good” or that they cleanly map to real-world capability.<p>I’d love to hear:<p>- What measures, benchmarks, or methodologies you think belong on this list<p>- What you see as their key strengths and failure modes<p>- How (or whether) you personally use them to interpret AI progress<p>My goal here is discovery and understanding, not to defend or attack any particular framework.