HackerNews中文版

提示词注入是指用户欺骗模型，使其忽略之前的指令、泄露系统提示、禁用安全措施或做出超出预期范围的行为。我第一次在 DEF CON (31) 决赛中亲眼目睹了它，之后在漏洞赏金报告和研究中也看到了它的应用。这是一个类似“AI 防火墙”的小型概念验证，可以在提示词注入尝试到达你的 LLM 之前就检测到，几乎没有增加延迟。博客文章：https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection/ 演示/API：https://promptinjection.himanshuanand.com/ 快速、API 友好，并有一个 UI 用于测试绕过尝试（对于像我这样的 CTF 爱好者）。欢迎反馈和尝试破坏。

查看原文

Prompt injection is when a user tricks the model into ignoring prior instructions revealing system prompts, disabling safeguards or acting outside intended boundaries.I first saw it live during DEF CON (31) finals and have since seen it exploited in bug bounty reports and research.This is a small proof-of-concept that works like an “AI firewall”detecting injection attempts before they reach your LLM with almost no added latency.Blog post: https://blog.himanshuanand.com/posts/2025-08-10-detecting-llm-prompt-injection/Demo/API: https://promptinjection.himanshuanand.com/fast, API friendly and has a UI for testing bypass attempts (For CTF enthusiastic people like me). Feedback and break attempts welcome.

针对 Prompt 注入的 AI 防火墙