HackerNews中文版

我尝试了许多不同的模型，毫无疑问，它们生成的代码在“质量”方面差异很大。其中一些确实是主观的，但“好”代码也有客观的方面。我希望这是 AI 基准测试的一个指标，这样我就可以根据它来选择模型，因为说实话，这是我最关心的事情之一。问题：你如何衡量这些东西，有什么指标？ ……也许根本就没有办法做到这一点，因为这个指标不在图表中。

查看原文

I've tried many different models and without doubt the code coming out of them differs a lot when it comes to "quality". Some of that is subjective for sure, but there are objective sides to "good" code.<p>I wish this was a metric for the AI benchmarks so I could choose a model based on this, because honestly it's one of the things I care most about.<p>Problem: How can you measure such things, whats the metrcis?<p>...maybe there just isn't a way to do it, since that metric isn't in the charts..

HN 提问：是否存在衡量 AI 代码质量的指标？