HackerNews中文版

我们正在构建一个无人机（UAV）系统，用于使用机载计算（高通 QRB5165）拦截快速移动的目标（100公里/小时以上）。我们在延迟与分辨率的权衡方面遇到了瓶颈，很希望听到来自计算机视觉/嵌入式社区的一些经过实战检验的意见。约束条件：我们需要高清分辨率来检测远距离的小目标，但在全高清帧上运行推理会降低我们的控制环路频率（目标响应时间小于20毫秒）。我们正在考虑两种架构方案：方案 A：静态分块（SAHI 风格）将高清帧分割成重叠的块。优点：对小目标的检测概率高。缺点：即使使用无 NMS（非极大值抑制）的架构，在 DSP 上的推理时间也会增加两倍。延迟峰值会导致我们的比例导航制导系统振荡。方案 B：动态 ROI（“狙击手方法”）以高帧率运行低分辨率的全局搜索（320x320）。一旦找到目标，就从原始相机流中锁定一个动态的高分辨率感兴趣区域（ROI），并且仅对该裁剪区域运行推理。优点：速度极快。保持环路紧凑。缺点：单点故障。如果跟踪器（卡尔曼滤波器）由于突然的自运动而丢失了裁剪区域，那么在全局搜索重新捕获之前，我们将处于盲区。在末端拦截阶段，这将导致拦截失败。有人成功地在边缘计算芯片（Jetson/Hexagon DSP）上为不规则目标实现了鲁棒的动态 ROI 吗？我们是否过度设计了这个问题，或者全帧高清推理对于实时制导来说根本行不通？欢迎提供论文或代码库的参考。附注：如果您热衷于解决这类问题（并且喜欢在慕尼黑解决它们），我们正在寻找一位创始工程师来负责整个流程。邮箱地址见个人资料。

查看原文

We are building a UAV system to physically intercept fast-moving targets (100km/h+) using onboard compute only (Qualcomm QRB5165).We hit a wall regarding the Latency vs. Resolution trade-off and I’d love to hear some battle-tested opinions from the CV/Embedded community.The constraint: We need HD resolution to detect small targets at range, but running inference on full HD frames kills our control loop frequency (Target is <20ms glass-to-motor response).We are debating two architectural paths:Option A: Static Tiling (SAHI-style) Slice the HD frame into overlapping tiles.Pro: High detection probability for small objects.Con: Even with NMS-free architectures, the inference time on the DSP effectively triples. Latency spikes cause our Proportional Navigation guidance to oscillate.Option B: Dynamic ROI ("The Sniper Approach") Run a low-res global search (320x320) at high FPS. Once a target is found, lock a dynamic High-Res Region of Interest (ROI) from the raw camera stream and only run inference on that crop.Pro: Extremely fast. Keeps the loop tight.Con: Single Point of Failure. If the tracker (Kalman Filter) loses the crop due to abrupt ego-motion, we are blind until global search re-acquires. In a terminal phase intercept, that’s a miss.Has anyone here successfully implemented robust Dynamic ROI on edge silicon (Jetson/Hexagon DSP) for erratic targets? Are we over-engineering this, or is full-frame HD inference simply dead on arrival for real-time guidance?Any pointers to papers or repos are appreciated.PS: If you live for these kinds of problems (and enjoy solving them in Munich), we are looking for a Founding Engineer to own this entire pipeline. Email in profile.

Ask HN: 高速目标追踪（<20毫秒延迟）中，动态 ROI 与瓦片分割哪个更好？