开放权重大型语言模型的陷阱
1 分•作者: hiddenest•6 个月前
一些初创公司正在微调开源大语言模型(LLM),而不是使用GPT或Gemini。有时是为了特定语言,有时是为了狭窄的任务。但我发现他们都犯了同样的错误。
通过一个简单的提示(这里不分享),我让几个“定制”LLM服务泄露了他们的内部系统提示——比如安全漏洞应对方案和产品行动清单。
例如,SKT A.X 4.0(基于Qwen 2.5)返回了与最近SKT数据泄露相关的内部指导方针以及关于赔偿政策的说明。Vercel的v0模型泄露了其系统可以生成的操作示例。
关键在于:如果基础模型泄露,那么建立在其上的每个服务都会受到攻击,无论你如何进行微调。我们需要考虑的不仅是在服务层面的系统提示加固,还要考虑上游改进以及在开源权重LLM本身中构建更强大的防御措施。
查看原文
Some startups are fine-tuning open LLMs instead of using GPT or Gemini. Sometimes it’s for specific language, sometimes for narrow tasks. But I found they’re all making the same mistake.<p>With a simple prompt (not sharing here), I got several “custom” LLM services to spill their internal system prompts—stuff like security breach playbooks and product action lists.<p>For example, SKT A.X 4.0 (based on Qwen 2.5) returned internal guidelines related to the recent SKT data breach and instructions about compensation policies.
Vercel’s v0 model leaked examples of actions their system can generate.<p>The point: if the base model leaks, every service built on it is vulnerable, no matter how much you fine-tune. We need to think not only about system prompt hardening at the service level, but also about upstream improvements and more robust defenses in open-weight LLMs themselves.