HackerNews中文版

我不断地将前沿模型推向极限，还有几个项目它们仍然无法解决，我会在这些项目上对新模型进行基准测试。每一个新模型都让“解决”更难的问题变得更容易，但我仍然觉得它们 99% 依赖于我的想法。它们就是无法理解这些想法，我必须手把手地指导它们。请不要误会我的意思，任何已经接近完成的事情，它们都能做得很好，并且可以结合现有技术。我说的是模型从未见过的新想法。举个例子，我有一个业余项目，它在路线优化方面不断突破可能性。是的，它已经接近 SOTA（State-of-the-Art），并且比目前 (?) 所有其他解决方案都更有效率 (punnerud.github.io/mpee/)，但我必须手把手地指导模型，并一起构思如何压缩矩阵。而且这并非一次性的事情，在几天内会发生大约 40-50 次。那 1% 就是这个“新想法”的部分。为什么我能想出所有这些想法，而模型却不能？这是一个非常难以实现的“重新评估”。现在这个项目是公开的，之后我打算以同样的方式做一个前沿项目，不对公众开放，并将其用作基准测试。这是测试模型新想法的最佳方式吗？

查看原文

I keep pushing the frontier models to the limits and have several projects they still can’t solve, I benchmark new models on. Every new model make it easier to “solve” the even harder problems, but still I have this feeling that they rely 99% on my ideas. They just don’t get the ideas and I have to hold their hand and help them.<p>Don’t get me wrong, anything that is close to done already they excel at and can combine existing techniques. I’m talking about new ideas models have never seen before.<p>Example I have this hobby project that push what’s possible with route optimization. Yes it’s close to SOTA and way more efficient than all (?) other solutions out there (punnerud.github.io/mpee/), but I have to hold the model in the hand and brainstorm ideas on how to compress a matrix.<p>And it’s just not a one time thing, happens like 40-50 times in few days.<p>The 1% there is this “new ideas” part. Why can I come up with all these, and not the model? A really hard reval to create. Now this project is open, later I am thinking about making a frontier project in the same way, keeping it away from the public and using it as a benchmark. It’s that the best way to test for new ideas in models?

只有我一个人觉得 Mythos/Fabel 还有 1% 的差距吗？