Show HN: Autofix Bot – 混合静态分析与 AI 代码审查代理
9 分•作者: sanketsaurav•18 天前
大家好,HN!我们是来自 DeepSource (YC W20) 的 Jai 和 Sanket,今天我们推出了 Autofix Bot,这是一个混合静态分析 + AI 智能体,专为与 AI 编码智能体协同使用而设计。<p>AI 编码智能体让代码生成变得几乎免费,并将瓶颈转移到了代码审查。仅靠静态分析和固定的检查器是不够的。仅靠 LLM 进行审查也有几个局限性:运行结果不确定,安全问题召回率低,大规模使用成本高,并且容易“分心”。<p>我们花了过去 6 年的时间构建了一个确定性的、仅基于静态分析的代码审查产品。今年年初,我们开始从头思考这个问题,并意识到静态分析解决了仅靠 LLM 审查的关键盲点。在过去的六个月里,我们构建了一个新的“混合”智能体循环,它将静态分析和前沿 AI 智能体结合起来,在发现和修复代码质量和安全问题方面,优于仅静态分析和仅 LLM 工具。今天,我们向公众开放它。<p>以下是混合架构的工作原理:<p>- 静态阶段:5,000 多个确定性检查器(代码质量、安全、性能)建立高精度基线。一个子智能体抑制特定于上下文的误报。<p>- AI 审查:智能体使用静态发现作为锚点来审查代码。可以访问 AST、数据流图、控制流、导入图等工具,而不仅仅是 grep 和常用的 shell 命令。<p>- 修复:子智能体生成修复方案。静态工具在发出干净的 git patch 之前验证所有编辑。<p>静态分析解决了 LLM 的关键问题:运行结果不确定,安全问题召回率低(LLM 容易被风格问题分心),以及成本(静态收窄减少了提示大小和工具调用)。<p>在 OpenSSF CVE 基准测试 [1](200 多个真实的 JS/TS 漏洞)中,我们的准确率为 81.2%,F1 值为 80.0%;而 Cursor Bugbot(准确率 74.5%,F1 77.42%),Claude Code(准确率 71.5%,F1 62.99%),CodeRabbit(准确率 59.4%,F1 36.19%)和 Semgrep CE(准确率 56.9%,F1 38.26%)。在秘密信息检测方面,F1 值为 92.8%;而 Gitleaks(75.6%),detect-secrets(64.1%)和 TruffleHog(41.2%)。我们为此使用了我们的开源分类模型。[2]<p>完整的评估方法和我们如何评估每个工具:<a href="https://autofix.bot/benchmarks" rel="nofollow">https://autofix.bot/benchmarks</a><p>您可以使用我们的 TUI 在任何存储库上交互式地使用 Autofix Bot,将其作为 Claude Code 的插件,或通过我们的 MCP 在任何兼容的 AI 客户端(如 OpenAI Codex)上使用。[3] 我们专门为 AI 编码智能体优先的工作流程而构建,因此您可以要求您的智能体自主地在每个检查点上运行 Autofix Bot。<p>今天就来试试吧:<a href="https://autofix.bot" rel="nofollow">https://autofix.bot</a>。我们很乐意听取任何反馈!<p>---<p>[1] <a href="https://github.com/ossf-cve-benchmark/ossf-cve-benchmark" rel="nofollow">https://github.com/ossf-cve-benchmark/ossf-cve-benchmark</a><p>[2] <a href="https://huggingface.co/deepsource/Narada-3.2-3B-v1" rel="nofollow">https://huggingface.co/deepsource/Narada-3.2-3B-v1</a><p>[3] <a href="https://autofix.bot/manual/#terminal-ui" rel="nofollow">https://autofix.bot/manual/#terminal-ui</a>
查看原文
Hi there, HN! We’re Jai and Sanket from DeepSource (YC W20), and today we’re launching Autofix Bot, a hybrid static analysis + AI agent purpose-built for in-the-loop use with AI coding agents.<p>AI coding agents have made code generation nearly free, and they’ve shifted the bottleneck to code review. Static-only analysis with a fixed set of checkers isn’t enough. LLM-only review has several limitations: non-deterministic across runs, low recall on security issues, expensive at scale, and a tendency to get ‘distracted’.<p>We spent the last 6 years building a deterministic, static-analysis-only code review product. Earlier this year, we started thinking about this problem from the ground up and realized that static analysis solves key blind spots of LLM-only reviews. Over the past six months, we built a new ‘hybrid’ agent loop that uses static analysis and frontier AI agents together to outperform both static-only and LLM-only tools in finding and fixing code quality and security issues. Today, we’re opening it up publicly.<p>Here’s how the hybrid architecture works:<p>- Static pass: 5,000+ deterministic checkers (code quality, security, performance) establish a high-precision baseline. A sub-agent suppresses context-specific false positives.<p>- AI review: The agent reviews code with static findings as anchors. Has access to AST, data-flow graphs, control-flow, import graphs as tools, not just grep and usual shell commands.<p>- Remediation: Sub-agents generate fixes. Static harness validates all edits before emitting a clean git patch.<p>Static solves key LLM problems: non-determinism across runs, low recall on security issues (LLMs get distracted by style), and cost (static narrowing reduces prompt size and tool calls).<p>On the OpenSSF CVE Benchmark [1] (200+ real JS/TS vulnerabilities), we hit 81.2% accuracy and 80.0% F1; vs Cursor Bugbot (74.5% accuracy, 77.42% F1), Claude Code (71.5% accuracy, 62.99% F1), CodeRabbit (59.4% accuracy, 36.19% F1), and Semgrep CE (56.9% accuracy, 38.26% F1).
On secrets detection, 92.8% F1; vs Gitleaks (75.6%), detect-secrets (64.1%), and TruffleHog (41.2%). We use our open-source classification model for this. [2]<p>Full methodology and how we evaluated each tool: <a href="https://autofix.bot/benchmarks" rel="nofollow">https://autofix.bot/benchmarks</a><p>You can use Autofix Bot interactively on any repository using our TUI, as a plugin in Claude Code, or with our MCP on any compatible AI client (like OpenAI Codex).[3] We’re specifically building for AI coding agent-first workflows, so you can ask your agent to run Autofix Bot on every checkpoint autonomously.<p>Give us a shot today: <a href="https://autofix.bot" rel="nofollow">https://autofix.bot</a>. We’d love to hear any feedback!<p>---<p>[1] <a href="https://github.com/ossf-cve-benchmark/ossf-cve-benchmark" rel="nofollow">https://github.com/ossf-cve-benchmark/ossf-cve-benchmark</a><p>[2] <a href="https://huggingface.co/deepsource/Narada-3.2-3B-v1" rel="nofollow">https://huggingface.co/deepsource/Narada-3.2-3B-v1</a><p>[3] <a href="https://autofix.bot/manual/#terminal-ui" rel="nofollow">https://autofix.bot/manual/#terminal-ui</a>