Launch HN: Gecko Security (YC F24) – Gecko Security:用 AI 查找代码中的漏洞
11 分•作者: jjjutla•10 个月前
大家好,我是 Gecko Security 的联合创始人 JJ (<a href="https://www.gecko.security">https://www.gecko.security</a>)。我们正在构建一种新型的静态分析工具,它使用 LLM 来发现当前扫描器遗漏的复杂业务逻辑和多步骤漏洞。我们已经用它在 Ollama、Gradio 和 Ragflow 等项目中发现了 30 多个 CVE (<a href="https://www.gecko.security/research">https://www.gecko.security/research</a>)。你可以在任何 OSS 仓库上亲自试用它 (<a href="https://app.gecko.security">https://app.gecko.security</a>)。
任何使用过 SAST(静态应用程序安全测试)工具的人都知道,它们存在高误报率的问题,同时还会遗漏 AuthN/Z 绕过或权限提升等整个类别的漏洞。这种局限性是其核心架构造成的。SAST 工具的设计初衷是将代码解析成一个简化的模型,例如 AST 或调用图,这在动态类型语言或跨微服务边界时会迅速失去上下文,并且仅限于解析基本的调用链。在检测漏洞时,它们依赖于使用正则表达式或 YAML 规则进行模式匹配,这对于 XSS、SQLi 等基本技术类漏洞可能有效,但对于不符合已知模式且需要长序列相关操作才能达到可利用状态的逻辑缺陷来说,则不够充分。
我的联合创始人和我在国家情报部门和军队网络部队的职业生涯中都看到了这些局限性,我们构建了自动化工具来防御关键基础设施。我们意识到,LLM 只要有正确的架构,最终就能解决这些问题。
漏洞是上下文相关的。可利用性完全取决于每个应用程序的安全模型。我们意识到,准确的检测需要理解应该保护什么以及破坏它的重要性。这意味着将威胁建模直接嵌入到我们的分析中,而不是将其视为事后考虑。
为了实现这一点,我们首先必须解决代码解析问题。我们的解决方案是构建一个自定义的、编译器精确的索引器,其灵感来自于 GitHub 的堆栈图方法,以便像 IDE 一样精确地导航代码。我们基于 LSIF 方法 (<a href="https://lsif.dev/" rel="nofollow">https://lsif.dev/</a>),但用紧凑的 protobuf 模式替换了冗长的 JSON,以二进制格式序列化符号定义和引用。我们使用特定于语言的工具来解析和类型检查代码,发出 protobuf 消息序列,记录符号的位置、定义和引用信息。通过使用 Protobuf 的效率和强类型,我们可以生成更小的索引,同时保留检测复杂调用链所需的编译器精确的语义信息。
这就是为什么大多数使用 AST 解析的“SAST + LLM”工具会失败的原因——它们从传统解析器向 LLM 提供了不完整或不正确的代码信息,这使得很难准确地推断安全问题,因为缺少上下文。
通过我们的索引器提供准确的代码结构,我们使用 LLM 通过分析开发人员意图、数据和信任边界以及暴露的端点来生成潜在的攻击场景。这就是 LLM 倾向于产生幻觉的地方,它变成了一个突破性的功能。
对于生成的每个潜在攻击路径,我们都会执行系统搜索,查询索引器以收集所有必要的上下文并重建从源到汇的完整调用链。为了验证漏洞,我们使用蒙特卡罗树自精炼 (MCTSr) 算法和一个“获胜函数”来确定假设攻击可能成功的可能性。一旦发现的结果高于设定的实用阈值,就会被确认为真正的阳性。
使用这种方法,我们发现了 ONYX(一个 OSS 企业搜索平台)中的漏洞,例如 CVE-2025-51479,其中管理员可以修改任何组,而不仅仅是他们分配的组。用户组 API 有一个用户参数,应该检查权限,但从未被使用。Gecko 推断开发人员打算限制管理员访问权限,因为 UI 和类似的 API 函数正确地验证了此权限。这确立了“管理员权限范围有限”作为一个安全不变式,而这个特定的 API 违反了它。传统的 SAST 无法检测到这一点。任何标记未使用用户参数的规则都会让你淹没在误报中,因为许多函数会合法地保留未使用的参数。更重要的是,检测到这一点需要知道哪些函数处理授权,理解 ONYX 的管理员权限模型,并识别跨多个文件的验证模式——这是 SAST 根本无法做到的上下文推理。
我们有几家企业客户在使用 Gecko,因为它解决了他们无法使用传统 SAST 工具解决的问题。他们在相同的代码库上看到的误报减少了 50%,并且发现了以前只在手动渗透测试中出现的漏洞。
深入研究误报,没有静态分析工具能够实现完美的准确性,无论是 AI 还是其他。我们在两个关键点减少了误报。首先,我们的索引器消除了任何会创建不正确调用链的编程解析错误,而传统 AST 工具容易受到这种错误的影响。其次,我们通过提出具体的、上下文相关的问题,而不是开放式的问题,来避免不必要的 LLM 幻觉和推理错误。LLM 知道哪些安全不变式需要保持,并且可以根据上下文进行确定性评估。当我们标记某些内容时,手动审查很快,因为我们提供了完整的源到汇数据流分析,并附有概念验证代码,并根据置信度得分输出结果。
我们很乐意收到社区的任何反馈、未来方向的建议或在此领域的经验。我会在评论区回复!
查看原文
Hey HN, I'm JJ, Co-Founder of Gecko Security (<a href="https://www.gecko.security">https://www.gecko.security</a>). We're building a new kind of static analysis tool that uses LLMs to find complex business logic and multi-step vulnerabilities that current scanners miss. We’ve used it to find 30+ CVEs in projects like Ollama, Gradio, and Ragflow (<a href="https://www.gecko.security/research">https://www.gecko.security/research</a>). You can try it yourself on any OSS repo at (<a href="https://app.gecko.security">https://app.gecko.security</a>).<p>Anyone who’s used SAST (Static Application Security Testing) tools knows the issues of high false positives while missing entire classes of vulnerabilities like AuthN/Z bypasses or privilege escalations. This limitation is a result of their core architecture. By design, SAST tools parse code into a simplistic model like an AST or call graph, which quickly loses context in dynamically typed languages or across microservice boundaries, and limits coverage to only resolving basic call chains. When detecting vulnerabilities they rely on pattern matching with Regex or YAML rules, which can be effective for basic technical classes like (XSS, SQLi) but inadequate for logic flaws that don’t conform to well-known shapes and need long sequences of dependent operations to reach an exploitable state.<p>My co-founder and I saw these limitations throughout our careers in national intelligence and military cyber forces, where we built automated tooling to defend critical infrastructure. We realised that LLMs, with the right architecture, could finally solve them.<p>Vulnerabilities are contextual. What's exploitable depends entirely on each application's security model. We realized accurate detection requires understanding what's supposed to be protected and why breaking it matters. This meant embedding threat modeling directly into our analysis, not treating it as an afterthought.<p>To achieve this, we first had to solve the code parsing problem. Our solution was to build a custom, compiler-accurate indexer inspired by GitHub's stack graphs approach to precisely navigate code, like an IDE. We build on the LSIF approach (<a href="https://lsif.dev/" rel="nofollow">https://lsif.dev/</a>) but replace the verbose JSON with a compact protobuf schema to serialise symbol definitions and references in a binary format. We use language‑specific tools to parse and type‑check code, emitting a sequence of Protobuf messages that record a symbol’s position, definition, and reference information. By using Protobuf’s efficiency and strong typing, we can produce smaller indexes, but also preserve the compiler‑accurate semantic information required for detecting complex call chains.<p>This is why most "SAST + LLM" tools that use AST parsing fail - they feed LLMs incomplete or incorrect code information from traditional parsers, making it difficult to accurately reason about security issues with missing context.<p>With our indexer providing accurate code structure, we use an LLM to perform threat modeling by analyzing developer intent, data and trust boundaries, and exposed endpoints to generate potential attack scenarios. This is where LLMs' tendency to hallucinate becomes a breakthrough feature.<p>For each potential attack path generated, we perform a systematic search, querying the indexer to gather all necessary context and reconstruct the full call chain from source to sink. To validate the vulnerability we use a Monte Carlo Tree Self-refine (MCTSr) algorithm and a 'win function' to determine the likelihood that a hypothesized attack could work. Once a finding is above a set practicality threshold it is confirmed as a true positive.<p>Using this approach, we discovered vulnerabilities like CVE-2025-51479 in ONYX (an OSS enterprise search platform) where Curators could modify any group instead of just their assigned ones. The user-group API had a user parameter that should check permissions but never used it. Gecko inferred developers intended to restrict Curator access because both the UI and similar API functions properly validated this permission. This established "curators have limited scope" as a security invariant that this specific API violated. Traditional SAST can't detect this. Any rule to flag unused user parameters would drown you in false positives since many functions legitimately keep unused parameters. And more importantly, detecting this requires knowing which functions handle authorization, understanding ONYX's Curator permission model, and recognizing the validation pattern across multiple files - contextual reasoning that SAST simply cannot do.<p>We have several enterprise customers using Gecko because it solves problems they couldn't address with traditional SAST tools. They're seeing 50% fewer false positives on the same codebases and finding vulnerabilities that previously only showed up in manual pentests.<p>Digging into false positives, no static analysis tool will ever achieve perfect accuracy, AI or otherwise. We reduce them at two key points. First, our indexer eliminates any programmatic parsing errors that create incorrect call chains that traditional AST tools are susceptible to. Second, we avoid unwanted LLM hallucinations and reasoning errors by asking specific, contextual questions rather than open-ended ones. The LLM knows which security invariants need to hold and can make deterministic assessments based on the context. When we do flag something, manual review is quick because we provide complete source-to-sink dataflow analysis with proof-of-concept code and output findings based on confidence scores.<p>We’d love to get any feedback from the community, ideas for future direction, or experiences in this space. I’ll be in the comments to respond!