HackerNews中文版

大家好，我是 Julius、Jago 和 Nils，我们正在开发 transload (transload.io)。 transload 帮助 LTL（零担）货运公司利用其终端中已安装的安防摄像头测量货物的尺寸。无需将货物通过专门的尺寸测量站，我们可以在货物通过正常装卸流程时自动测量其尺寸。我们为此搭建了一个 HN 专属的演示网站：https://hn.transload.io/ 在 LTL 货运中，尺寸非常重要，因为它会影响定价、货物分类和拖车利用率。如果货物的实际尺寸大于托运人报告的尺寸，承运商可能会少收费用，但仍然占用相同的拖车空间。显而易见的解决方法是测量每一批货物，但在繁忙的货运终端中，这却出奇地困难。专用的尺寸测量系统适用于经过它们的货物，但可能会增加叉车的行驶距离、造成装卸区拥堵，并改变正常的工作流程。实际上，许多终端只测量部分货物。 Jago 通过他家人的 LTL 货运和交叉转运业务，对这个行业非常了解。我们最初并非打算开发货物尺寸测量系统。我们的第一个想法是开发一个人工智能系统，用于优化交叉转运终端内的叉车路线。在与客户接触并与 50 多家货运公司交流后，我们发现叉车路线优化并不是人们反复提及的痛点。货物尺寸才是。与此同时，我们注意到空间人工智能技术正在快速发展。单目度量深度估计技术取得了显著进步，使得无需昂贵的 LiDAR 传感器，就能从普通摄像头拍摄的视频中准确恢复三维结构。MapAnything (https://github.com/facebookresearch/map-anything) 和 MoGe (https://github.com/microsoft/moge) 就是两个例子。货运终端也具有一些有利的结构：固定的摄像头、重复的工作流程、条形码扫描时间戳以及已知的布局。几乎每个仓库都装有闭路电视。这让我们想到一个简单的问题：如果我们能利用现有的安防摄像头自动测量货物，并且完全在后台进行，会怎么样？这样，承运商就可以在不改变装卸流程的情况下测量每一批货物。我们的系统主要有两个步骤：将条形码扫描与视频中的正确对象关联起来，然后估算该对象的真实尺寸。装卸工人已经在正常工作流程中扫描货物。每次扫描都会为我们提供一个时间戳和一个处理单元 ID。围绕该时间戳，我们分析视频，推断是哪位工人进行了扫描以及他们扫描的是哪批货物。我们曾期望 VLM（视觉语言模型）能处理这个问题；但事实证明它们太不可靠了。相反，我们训练了自己的模型，该模型通过注视、身体姿态和移动等线索进行三维推理。这种关联步骤至关重要。一个画面中可能包含数十个托盘、几名工人、叉车以及部分隐藏的货物。如果我们把扫描与错误的对象关联起来，测量结果就毫无意义。一旦我们确定了目标货物，我们就会对其进行分割，并从单目摄像头视图中估算出度量三维边界框。拟合好边界框后，尺寸就一目了然：长度、宽度、高度和体积都直接从中得出。难点在于如何仅凭一个普通安防摄像头精确地拟合这个边界框。单个二维图像不能直接告诉我们物体的形状或比例，而且许多不同的三维盒子可以解释看起来相似的图像证据。我们利用物体掩码、可见边缘、地面接触、摄像头几何以及终端的约束条件，来找到最符合场景的三维盒子。我们目前正与几家 LTL 承运商合作。对于其中一位客户，大约 10% 的已检查货物存在尺寸错误。第一个应用场景是收入追回：识别尺寸不符的货物，附上视觉证据，帮助承运商纠正账单或分类。从长远来看，相同的数据可以帮助承运商更好地了解拖车利用率。 LTL 货运是一个进行三维计算机视觉的奇特领域，我们每周都有新的发现。如果您曾从事单目重建、三维物体检测、仓库感知或混乱的真实世界计算机视觉领域的工作，我们非常乐意听取您的意见。关于货物、LTL 终端或技术方法的任何问题也十分欢迎。

查看原文

Hi HN — we’re Julius, Jago, and Nils, and we’re building transload (transload.io).transload helps LTL trucking companies measure freight dimensions using the security cameras already installed in their terminals. Instead of sending shipments through a dedicated dimensioning station, we measure them automatically as they move through the normal dock workflow.We’ve put together a small HN-specific demo site here: <a href="https://hn.transload.io/">https://hn.transload.io/</a>In LTL trucking, dimensions matter because they affect pricing, freight classification, and trailer utilization. If a shipment is larger than the shipper reported, the carrier may undercharge for it while still giving up the same amount of trailer space. The obvious fix is to measure every shipment, but that is surprisingly hard in a busy freight terminal. Dedicated dimensioning systems work for freight that passes through them, but they can add forklift travel, create dock congestion, and change the normal flow of work. In practice, many terminals only measure a sample of their shipments.Jago grew up close to this industry through his family’s LTL trucking and cross-docking business. We did not start out building freight dimensioning. Our first idea was an AI system for optimizing forklift routes inside cross-dock terminals. After spending time with customers and talking to more than 50 trucking companies, we realized that forklift routing was not the pain people kept bringing up. Freight dimensions were.At the same time, we saw that spatial AI was advancing quickly. Monocular metric depth estimation has become dramatically better, making it possible to recover accurate 3D structure from ordinary camera footage without expensive LiDAR sensors. MapAnything (<a href="https://github.com/facebookresearch/map-anything" rel="nofollow">https://github.com/facebookresearch/map-anything</a>) and MoGe (<a href="https://github.com/microsoft/moge" rel="nofollow">https://github.com/microsoft/moge</a>) are two examples.Freight terminals also have helpful structure: fixed cameras, repeated workflows, barcode scan timestamps, and known layouts. Nearly every warehouse already has CCTV. That led us to a simple question: what if we could measure freight automatically using the existing security cameras, entirely in the background? That would allow carriers to measure every shipment without changing the dock workflow.Our system has two main steps: connect a barcode scan to the right object in the video, then estimate that object’s dimensions in real-world units.Dock workers already scan freight as part of the normal workflow. Each scan gives us a timestamp and a handling-unit ID. Around that timestamp, we analyze the video to infer which worker scanned and which shipment they scanned. We expected VLMs to handle this; they turned out to be far too unreliable. Instead, we train our own model that reasons in 3D over cues like gaze, body orientation, and movement.That association step is critical. A frame can contain dozens of pallets, several workers, forklifts, and partially hidden freight. If we attach the scan to the wrong object, the measurement is useless.Once we know the target shipment, we segment it and estimate a metric 3D bounding box from the monocular camera view. After the box is fitted, the dimensions are straightforward: length, width, height, and volume come directly from it.The hard part is precisely fitting that bounding box from one ordinary security camera. A single 2D image does not directly tell you object shape or scale, and many different 3D boxes can explain similar-looking image evidence. We use the object mask, visible edges, floor contact, camera geometry, and constraints from the terminal to find the 3D box that best matches the scene.We are currently working with several LTL carriers. For one customer, roughly 10% of checked shipments had dimension errors. The first use case is revenue recovery: identify under-dimensioned shipments, attach visual evidence, and help carriers correct the billing or classification. Longer term, the same data can help carriers understand trailer utilization better.LTL freight is an odd place to be doing 3D computer vision, and we learn something new every week. If you’ve worked on monocular reconstruction, 3D object detection, warehouse perception, or messy real-world CV, we’d love your take. Questions about freight, LTL terminals, or the technical approach are very welcome too.

Launch HN：Transload (YC P26) – 使用闭路电视测量货运物品