Show HN: 可证明性架构——从 AI 代理的护栏到保证
1 分•作者: MADEinPARIS•9 个月前
Show HN: 可证明性结构 — 从安全措施到 AI 智能体的保证
<a href="https://mateopetel.substack.com/p/provability-fabric-the-safety-contract" rel="nofollow">https://mateopetel.substack.com/p/provability-fabric-the-saf...</a><p>如今,大多数 AI 的“安全”都只是一种感觉。
提示被强化,启发式方法被堆叠,仪表盘闪烁着绿色。但是,当智能体可以调用工具、流式传输输出或改变状态时,这种感觉是无法扩展的。你需要将数学融入到运行时中。<p>可证明性结构(PF)是一个用于携带证明行为的框架:<p>规范到证明层 → ActionDSL 策略编译为 Lean 4 义务 + 运行时监视器<p>可证明的智能体捆绑包(PABs)→ 绑定规范、证明、SBOM、出处的签名包<p>Rust 伴生程序 → 对所有事件的完整调解(调用、返回、日志、解密、emit_chunk、emit_end)<p>出口证书 → 每个输出都会获得一个签名判决(通过/失败/不适用)<p>TRUST-FIRE 测试套件 → 针对混乱、回滚、隐私衰减、对抗性适配器的压力测试<p>核心赌注:<p>“安全不是一个设置。它是你的语义和你的运行时之间的对应证明。”<p>希望得到 HN 的看法:<p>这是不是超越感觉,走向可证伪保证的正确方法?<p>可携带证明的捆绑包(如 PAB-1.0)是否应该成为部署智能体的标准?<p>你认为需要什么才能让你信任生产环境中的智能体——证明、监视器或其他什么?
查看原文
Show HN: Provability Fabric — From Guardrails to Guarantees for AI Agents
<a href="https://mateopetel.substack.com/p/provability-fabric-the-safety-contract" rel="nofollow">https://mateopetel.substack.com/p/provability-fabric-the-saf...</a><p>Most AI “safety” today is just vibes.
Prompts get hardened, heuristics stacked, dashboards blink green. But when agents can call tools, stream outputs, or mutate state, vibes don’t scale. You need math wired into the runtime.<p>Provability Fabric (PF) is a framework for proof-carrying behavior:<p>Spec-to-Proof layer → ActionDSL policies compile to Lean 4 obligations + runtime monitors<p>Provable Agent Bundles (PABs) → signed packages binding spec, proofs, SBOM, provenance<p>Rust sidecar → complete mediation of all events (call, ret, log, declassify, emit_chunk, emit_end)<p>Egress certificates → every output gets a signed verdict (pass/fail/inapplicable)<p>TRUST-FIRE test suite → stress tests for chaos, rollback, privacy burn-down, adversarial adapters<p>The core bet:<p>“Safety isn’t a setting. It’s a correspondence proof between your semantics and your runtime.”<p>Would love HN’s take:<p>Is this the right way to move beyond vibes toward falsifiable guarantees?<p>Should proof-carrying bundles (like PAB-1.0) become the standard for deploying agents?<p>What would it take for you to trust an agent in prod — proofs, monitors, or something else?