只有我一个人觉得 Mythos/Fabel 还有 1% 的差距吗?

1作者: punnerud17 天前
我不断地将前沿模型推向极限,还有几个项目它们仍然无法解决,我会在这些项目上对新模型进行基准测试。每一个新模型都让“解决”更难的问题变得更容易,但我仍然觉得它们 99% 依赖于我的想法。 它们就是无法理解这些想法,我必须手把手地指导它们。 请不要误会我的意思,任何已经接近完成的事情,它们都能做得很好,并且可以结合现有技术。我说的是模型从未见过的新想法。 举个例子,我有一个业余项目,它在路线优化方面不断突破可能性。是的,它已经接近 SOTA(State-of-the-Art),并且比目前 (?) 所有其他解决方案都更有效率 (punnerud.github.io/mpee/),但我必须手把手地指导模型,并一起构思如何压缩矩阵。 而且这并非一次性的事情,在几天内会发生大约 40-50 次。 那 1% 就是这个“新想法”的部分。为什么我能想出所有这些想法,而模型却不能? 这是一个非常难以实现的“重新评估”。现在这个项目是公开的,之后我打算以同样的方式做一个前沿项目,不对公众开放,并将其用作基准测试。这是测试模型新想法的最佳方式吗?
查看原文
I keep pushing the frontier models to the limits and have several projects they still can’t solve, I benchmark new models on. Every new model make it easier to “solve” the even harder problems, but still I have this feeling that they rely 99% on my ideas. They just don’t get the ideas and I have to hold their hand and help them.<p>Don’t get me wrong, anything that is close to done already they excel at and can combine existing techniques. I’m talking about new ideas models have never seen before.<p>Example I have this hobby project that push what’s possible with route optimization. Yes it’s close to SOTA and way more efficient than all (?) other solutions out there (punnerud.github.io&#x2F;mpee&#x2F;), but I have to hold the model in the hand and brainstorm ideas on how to compress a matrix.<p>And it’s just not a one time thing, happens like 40-50 times in few days.<p>The 1% there is this “new ideas” part. Why can I come up with all these, and not the model? A really hard reval to create. Now this project is open, later I am thinking about making a frontier project in the same way, keeping it away from the public and using it as a benchmark. It’s that the best way to test for new ideas in models?