Ask HN: 高速目标追踪(<20毫秒延迟)中,动态 ROI 与瓦片分割哪个更好?
1 分•作者: LucaHerakles•1 天前
我们正在构建一个无人机(UAV)系统,用于使用机载计算(高通 QRB5165)拦截快速移动的目标(100公里/小时以上)。
我们在延迟与分辨率的权衡方面遇到了瓶颈,很希望听到来自计算机视觉/嵌入式社区的一些经过实战检验的意见。
约束条件:我们需要高清分辨率来检测远距离的小目标,但在全高清帧上运行推理会降低我们的控制环路频率(目标响应时间小于20毫秒)。
我们正在考虑两种架构方案:
方案 A:静态分块(SAHI 风格) 将高清帧分割成重叠的块。
优点:对小目标的检测概率高。
缺点:即使使用无 NMS(非极大值抑制)的架构,在 DSP 上的推理时间也会增加两倍。延迟峰值会导致我们的比例导航制导系统振荡。
方案 B:动态 ROI(“狙击手方法”) 以高帧率运行低分辨率的全局搜索(320x320)。一旦找到目标,就从原始相机流中锁定一个动态的高分辨率感兴趣区域(ROI),并且仅对该裁剪区域运行推理。
优点:速度极快。保持环路紧凑。
缺点:单点故障。如果跟踪器(卡尔曼滤波器)由于突然的自运动而丢失了裁剪区域,那么在全局搜索重新捕获之前,我们将处于盲区。在末端拦截阶段,这将导致拦截失败。
有人成功地在边缘计算芯片(Jetson/Hexagon DSP)上为不规则目标实现了鲁棒的动态 ROI 吗?我们是否过度设计了这个问题,或者全帧高清推理对于实时制导来说根本行不通?
欢迎提供论文或代码库的参考。
附注:如果您热衷于解决这类问题(并且喜欢在慕尼黑解决它们),我们正在寻找一位创始工程师来负责整个流程。邮箱地址见个人资料。
查看原文
We are building a UAV system to physically intercept fast-moving targets (100km/h+) using onboard compute only (Qualcomm QRB5165).<p>We hit a wall regarding the Latency vs. Resolution trade-off and I’d love to hear some battle-tested opinions from the CV/Embedded community.<p>The constraint: We need HD resolution to detect small targets at range, but running inference on full HD frames kills our control loop frequency (Target is <20ms glass-to-motor response).<p>We are debating two architectural paths:<p>Option A: Static Tiling (SAHI-style) Slice the HD frame into overlapping tiles.<p>Pro: High detection probability for small objects.<p>Con: Even with NMS-free architectures, the inference time on the DSP effectively triples. Latency spikes cause our Proportional Navigation guidance to oscillate.<p>Option B: Dynamic ROI ("The Sniper Approach") Run a low-res global search (320x320) at high FPS. Once a target is found, lock a dynamic High-Res Region of Interest (ROI) from the raw camera stream and only run inference on that crop.<p>Pro: Extremely fast. Keeps the loop tight.<p>Con: Single Point of Failure. If the tracker (Kalman Filter) loses the crop due to abrupt ego-motion, we are blind until global search re-acquires. In a terminal phase intercept, that’s a miss.<p>Has anyone here successfully implemented robust Dynamic ROI on edge silicon (Jetson/Hexagon DSP) for erratic targets? Are we over-engineering this, or is full-frame HD inference simply dead on arrival for real-time guidance?<p>Any pointers to papers or repos are appreciated.<p>PS: If you live for these kinds of problems (and enjoy solving them in Munich), we are looking for a Founding Engineer to own this entire pipeline. Email in profile.