针对 Prompt 注入的 AI 防火墙
1 分•作者: unknownhad•9 个月前
提示词注入是指用户欺骗模型,使其忽略之前的指令、泄露系统提示、禁用安全措施或做出超出预期范围的行为。
我第一次在 DEF CON (31) 决赛中亲眼目睹了它,之后在漏洞赏金报告和研究中也看到了它的应用。
这是一个类似“AI 防火墙”的小型概念验证,可以在提示词注入尝试到达你的 LLM 之前就检测到,几乎没有增加延迟。
博客文章:https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection/
演示/API:https://promptinjection.himanshuanand.com/
快速、API 友好,并有一个 UI 用于测试绕过尝试(对于像我这样的 CTF 爱好者)。
欢迎反馈和尝试破坏。
查看原文
Prompt injection is when a user tricks the model into ignoring prior instructions revealing system prompts, disabling safeguards or acting outside intended boundaries.<p>I first saw it live during DEF CON (31) finals and have since seen it exploited in bug bounty reports and research.<p>This is a small proof-of-concept that works like an “AI firewall”<p>detecting injection attempts before they reach your LLM with almost no added latency.<p>Blog post: https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection/<p>Demo/API: https://promptinjection.himanshuanand.com/<p>fast, API friendly and has a UI for testing bypass attempts (For CTF enthusiastic people like me).
Feedback and break attempts welcome.