GPT-5 在文本生成方面不如 4.1-mini,在代码生成方面不如 Sonnet 4。
3 分•作者: hitradostava•9 个月前
看来 OpenAI 的公关机器运作得非常出色。Cursor 的 CEO 说它最好,Simon Willison 也是如此(https://simonwillison.net/2025/Aug/7/gpt-5/)。
但我发现它很糟糕。对于编码(在 Cursor 中),它很慢,经常在工具调用时失败(没有 MCP,只有标准的 Cursor 工具),并且将一些新的应用程序状态存储在 globalThis 中——这是在一年多的 Cursor / Claude Code 大量使用中,没有任何模型尝试过的。
对于我正在研究的摘要/洞察 API,它比 gpt-4.1-mini 差得多。我尝试了 mini 和 full gpt5,使用了不同的推理设置。它没有按照指令操作,并且在我的所有评估中,输出结果都更差,即使在对提示进行了大量调整之后也是如此。我做了很多抽样,结果客观上很糟糕。
只有我一个人这样觉得吗?有人看到 GPT-5 相对于其他模型的实际的、真实的优势吗?
查看原文
It seems that OpenAI have got the PR machine working amazingly. The Cursor CEO said it's the best, as did Simon Willison (https://simonwillison.net/2025/Aug/7/gpt-5/).<p>But I've found it terrible. For coding (in Cursor), it's slow, fails with tool calls often (no MCP just stock Cursor tools) and stored some new application state in globalThis - something that no model has ever attempted to do in over a year of very heavy Cursor / Claude Code use).<p>For a summarization/insights API that I work on, it was way worse than gpt-4.1-mini. I tried both mini and full gpt5, with different reasoning settings. It didn't follow instructions, and output was worse across all my evals, even after heavy prompt adjustment. I did a lot of sampling and the results were objectively bad.<p>Am I the only one? Has anyone seen actual real-world benefits of GPT-5 vs other models?