Verdic – 人工智能系统意图治理层

1作者: kundan_s__r6 个月前
我们在生产环境中部署 LLM 时,反复遇到相同的问题,因此构建了 Verdic:大多数 AI 故障并非源于内容安全,而是意图漂移。 随着模型变得更具自主性,输出结果常常悄无声息地从描述性行为转变为指令性行为——没有任何明确的信号表明系统正在采取行动。在这种情况下,关键词过滤器和基于规则的防护措施很快就会失效。 Verdic 是一个意图治理层,位于模型和应用程序之间。它不检查主题或关键词,而是评估: * 输出是否将未来的选择压缩为特定的行动方案 * 响应是否施加规范性压力(指导行为 vs 解释说明) 目标不是内容审核,而是行为控制:检测 AI 系统何时在其部署的意图之外运行,尤其是在受监管或决策关键的工作流程中。 Verdic 目前以 API 的形式运行,具有可配置的允许/警告/阻止结果。我们正在对自主性工作流程和长期运行的链进行测试,这些场景中意图漂移最难检测。 这是一个早期版本。我主要希望收到在生产环境中部署 LLM 的用户的反馈,特别是在以下方面: * 自主性系统 * AI 治理 * 风险与合规 * 我们可能遗漏的故障模式 欢迎提问或分享有关该方法的更多细节。
查看原文
We built Verdic after repeatedly running into the same issue while deploying LLMs in production: most AI failures aren’t about content safety, they’re about intent drift.<p>As models become more agentic, outputs often shift quietly from descriptive to prescriptive behavior — without any explicit signal that the system is now effectively taking action. Keyword filters and rule-based guardrails break down quickly in these cases.<p>Verdic is an intent governance layer that sits between the model and the application. Instead of checking topics or keywords, it evaluates:<p>whether an output collapses future choices into a specific course of action<p>whether the response exerts normative pressure (directing behavior vs explaining)<p>The goal isn’t moderation, but behavioral control: detecting when an AI system is operating outside the intent it was deployed for, especially in regulated or decision-critical workflows.<p>Verdic currently runs as an API with configurable allow &#x2F; warn &#x2F; block outcomes. We’re testing it on agentic workflows and long-running chains where intent drift is hardest to detect.<p>This is an early release. I’m mainly looking for feedback from people deploying LLMs in production, especially around:<p>agentic systems<p>AI governance<p>risk &amp; compliance<p>failure modes we might be missing<p>Happy to answer questions or share more details about the approach.