HackerNews中文版

我对目前评估人工智能进展的方式很感兴趣，并试图建立一个人们实际使用的主要方法列表。我知道所有这些衡量标准都有局限性，而且许多标准是有争议的，或者在设计上就是不完善的。我并不认为它们是“好的”，或者它们能清晰地映射到现实世界的能力。我很想听听： * 你认为哪些衡量标准、基准或方法应该在这个列表上 * 你认为它们的主要优势和失效模式是什么 * 你个人如何（或是否）使用它们来解读人工智能的进展我在这里的目标是探索和理解，而不是为任何特定的框架辩护或攻击。

查看原文

I’m interested in how AI progress is currently evaluated and trying to build a list of the major approaches people actually use.I’m aware that all of these measures have limitations and that many are controversial or imperfect by design. I’m not assuming they’re “good” or that they cleanly map to real-world capability.I’d love to hear:- What measures, benchmarks, or methodologies you think belong on this list- What you see as their key strengths and failure modes- How (or whether) you personally use them to interpret AI progressMy goal here is discovery and understanding, not to defend or attack any particular framework.

问 HN：AI 进展——人们主要通过哪些方式衡量它？