针对 Prompt 注入的 AI 防火墙

1作者: unknownhad9 个月前
提示词注入是指用户欺骗模型,使其忽略之前的指令、泄露系统提示、禁用安全措施或做出超出预期范围的行为。 我第一次在 DEF CON (31) 决赛中亲眼目睹了它,之后在漏洞赏金报告和研究中也看到了它的应用。 这是一个类似“AI 防火墙”的小型概念验证,可以在提示词注入尝试到达你的 LLM 之前就检测到,几乎没有增加延迟。 博客文章:https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection/ 演示/API:https://promptinjection.himanshuanand.com/ 快速、API 友好,并有一个 UI 用于测试绕过尝试(对于像我这样的 CTF 爱好者)。 欢迎反馈和尝试破坏。
查看原文
Prompt injection is when a user tricks the model into ignoring prior instructions revealing system prompts, disabling safeguards or acting outside intended boundaries.<p>I first saw it live during DEF CON (31) finals and have since seen it exploited in bug bounty reports and research.<p>This is a small proof-of-concept that works like an “AI firewall”<p>detecting injection attempts before they reach your LLM with almost no added latency.<p>Blog post: https:&#x2F;&#x2F;blog.himanshuanand.com&#x2F;posts&#x2F;2025-08-10-detecting-llm-prompt-injection&#x2F;<p>Demo&#x2F;API: https:&#x2F;&#x2F;promptinjection.himanshuanand.com&#x2F;<p>fast, API friendly and has a UI for testing bypass attempts (For CTF enthusiastic people like me). Feedback and break attempts welcome.