HackerNews中文版

顶尖模型都宣传具备工具使用能力，包括代码执行。那么，为什么我们仍然经常会遇到一个简单的 Python 脚本，其中存在逻辑错误，而这些错误在 Python 解释器运行 0.1 秒内就能被立即发现呢？

查看原文

The leading models all advertise tool use including code execution. So why is it still common to get a simple Python script that has a logical bug which would be immediately discoverable upon running a Python interpreter for 0.1 seconds?

为什么大语言模型在给你代码之前，仍然不运行它呢？