Ask HN: 为什么 Python 中的死代码检测比大多数工具承认的更难?
2 分•作者: duriantaco•2 天前
我一直在思考为什么在 Python 中,死代码检测(以及一般的静态分析)感觉不如其他语言那么可靠。我理解 Python 本质上是动态的。<p>理论上,这应该很简单(再次强调,理论上):解析 AST,构建调用图,找到零引用的符号。但在实践中,它很快就会失效,原因有很多,例如:<p>1. 动态分发(getattr、注册表、插件系统)<p>2. 框架入口点(Flask/FastAPI 路由、Django 视图、pytest fixtures)<p>3. 装饰器和隐式命名约定<p>4. 仅通过测试或运行时配置调用的代码<p>大多数工具似乎都选择了两种糟糕的权衡之一:<p>1. 保持保守,错过大量真正意义上的死代码<p>或者<p>2. 过于激进,标记出虚假阳性,导致人们不再信任<p>到目前为止,对我来说最有效的方法是将代码视为一种置信度评分,再加上一些有限的运行时信息(例如,在测试期间实际执行了什么),而不是依赖 100% 的静态分析。<p>很好奇其他人如何在实际代码库中处理这个问题。<p>你们是接受虚假阳性吗?还是完全忽略死代码检测?有没有人见过真正可扩展的方法?我知道 SonarQube 噪音很大。<p>我构建了一个带有 VS Code 扩展的库,主要用于探索这些权衡(如果相关,请看下面的链接),但我更感兴趣的是其他人如何思考这个问题。也希望我是在正确的频道里<p>仓库链接:https://github.com/duriantaco/skylos
查看原文
I’ve been thinking about why dead code detection (and static analysis in general) feels so unreliable in Python compared to other languages. I understand that Python is generally dynamic in nature.<p>In theory it should be simple(again in theory): parse the AST, build a call graph, find symbols with zero references. In practice it breaks down quickly because of many things like:<p>1. dynamic dispatch (getattr, registries, plugin systems)<p>2. framework entrypoints (Flask/FastAPI routes, Django views, pytest fixtures)<p>3. decorators and implicit naming conventions<p>4. code invoked only via tests or runtime configuration<p>Most tools seem to pick one of two bad tradeoffs:<p>1. be conservative and miss lots of genuinely dead code<p>or<p>2. be aggressive and flag false positives that people stop trusting<p>What’s worked best for me so far is treating the code as sort of a confidence score, plus some layering in limited runtime info (e.g. what actually executed during tests) instead of relying on 100% static analysis.<p>Curious how others handle this in real codebases..<p>Do yall just accept false positives? or do yall ignore dead code detection entirely? have anyone seen approaches that actually scale? I am aware that sonarqube is very noisy.<p>I built a library with a vsce extension, mainly to explore these tradeoffs (link below if relevant), but I’m more interested in how others think about the problem. Also hope I'm in the right channel<p>Repo for context: https://github.com/duriantaco/skylos