提问 HN:你们如何解决 AI 的“糊涂副手”问题?

1作者: david_shi大约 1 个月前
一个智能体的价值与其被授予的权限成正比。 围绕诸如默认拒绝代理、密钥库等解决方案,一直存在很多炒作,但似乎没有任何东西能解决核心矛盾:智能体可能被欺骗,从而听从攻击者的指令。 我能想到的最好的办法是运行一个观察者循环,用另一个大语言模型来监控智能体所做的一切,但我很好奇是否有人有更优雅的解决方案。
查看原文
An agent&#x27;s value is proportional to the permissions it&#x27;s been granted.<p>There&#x27;s been a lot of hype around solutions like default denial proxies, key vaults, and more, but nothing seems to address the core tension: an agent can be tricked into doing an attacker&#x27;s bidding.<p>The best thing I could think of was to just run an observer loop and monitor everything the agent does with another LLM, but I&#x27;m curious if anyone has an elegant solution.