Opus 4.5 评测(定制方案)

1作者: tactics66557 个月前
1. 严重幻觉<p>它连一份文档都写不对。我让它为我正在构建的解释器编写 API 文档。它凭空捏造了不存在的函数,并且把所有的参数列表都搞错了。 像 `shouldClose`、`swapBuffers`、`pollEvents`、`terminate` 这样的函数根本不在我的解释器里,但它却把它们放进了文档,而且每个参数都错了。<p>2. 擅长创建重复函数<p>在 C 语言中,重复的函数名是无法编译的,但它似乎有 60% 的几率会创建重复的函数。它甚至会在 `switch` 语句中创建重复的 `case` 标签。它经常会用不同的名字生成完全相同的代码。<p>3. 极擅长制造 Bug<p>它创建 `struct`,初始化它们,赋值,然后当被要求初始化和运行或实现 SSR 反射时,它会将 alpha 值设置为完全不透明,导致整个 GLB 变成透明的,或者当被要求制作玻璃材质时,它只是让它变得不透明(我试了 15 次,最终放弃了)。他们声称它达到了专业工程师的水平——什么,幼儿园水平的工程师吗?<p>4. 从未给出真正有效的代码<p>即使是读取单个文件的代码也是漏洞百出的混乱;自己实现都更快。<p>它在这里和那里做空值检查,但也许它应该先检查其他事情,而不是把时间花在那些上面。<p>5. 严重的内存泄漏<p>它生成的代码存在内存泄漏。使用 Claude Opus 4.5 让我非常沮丧,以至于我在提示中写了很多脏话。<p>6. 添加我没有要求的东西<p>它在阴影中添加噪声,声称看起来更自然,但它看起来只是脏兮兮的。我甚至看到它通过在地面上放置一个白色的平面并称之为光反射来“修复”反射。<p>7. 莫扎特骰子水平的创作<p>祝你好运能得到可靠的代码——一切都是有缺陷的、残缺不全的混乱。<p>8. 只要它不崩溃,一切都好<p>如果一个函数应该返回一个浮点数,那就把它当作一个整数来比较——所以把函数的返回类型改成 int!Claude Opus 4.5 真是太棒了!它可能很快就会达到 AGI!
查看原文
1. Severe hallucinations<p>It can’t write a single document correctly. I asked it to write the API documentation for the interpreter I’m building. It invented functions that don’t exist and got all the parameter lists wrong. Functions like `shouldClose`, `swapBuffers`, `pollEvents`, `terminate` aren’t even in my interpreter, yet it put them in the docs and got every parameter wrong.<p>2. It’s great at creating duplicate functions<p>In C, duplicate function names won’t compile, but it seems to create duplicates about 60% of the time. It even creates duplicate `case` labels in `switch` statements. It often produces identical code under different names.<p>3. It’s excellent at producing bugs<p>It creates `struct`s, initializes them, assigns values, then when asked to initialize and run or to implement SSR reflection it sets the alpha to full so the entire GLB becomes transparent, or when asked to make a glass material it just makes it opaque (I tried 15 times and gave up). They claim it’s at a professional engineer level—what, kindergarten-level engineers?<p>4. It never gives code that actually works<p>Even code to read a single file is a bug-ridden mess; it’s faster to implement it yourself.<p>It does null checks here and there, but maybe it should check other things first instead of spending time on those.<p>5. Serious memory leaks<p>It generates code with memory leaks. Using Claude Opus 4.5 made me so frustrated that I wrote a lot of profanity into my prompts.<p>6. It adds things I didn’t ask for<p>It adds noise to shadows claiming it looks more natural, but it just looks dirty. I’ve even seen it “fix” reflections by placing a white plane on the floor and calling that light reflection.<p>7. Mozart’s Dice–level composition<p>Good luck getting solid code—everything is a defective, patchy mess.<p>8. As long as it doesn&#x27;t crash, that&#x27;s all that matters<p>If a function is supposed to return a float, just compare it as an int—so change the function&#x27;s return type to int! Claude Opus 4.5 is truly amazing! It’ll probably reach AGI soon!