Show HN: 利用 AI 增强远程操作扩展机器人数据收集
1 分•作者: lorepieri•5 个月前
TLDR: 我正在使用 AI 等技术,让机器人远程操作变得更快,并能够在长时间内保持稳定,从而实现大规模的真实机器人数据收集,用于训练机器人基础模型。
我们可能还差 5 到 6 个数量级的数据,才能获得训练机器人基础模型所需的真实机器人数据,那么我们该如何获取这些数据呢?我认为模拟或视频可以作为补充,但无法替代大量的真实机器人数据。
我一直在探索扩展机器人远程操作的方法,传统上,远程操作仅限于缓慢的、高价值的使用场景(如核退役、医疗保健)。这里有一个来自原始测试会话的短视频(需要大量解释!):
[https://youtu.be/QYJNJj8m8Hg](https://youtu.be/QYJNJj8m8Hg)
这里发生了什么?
首先,这是真正的机器人远程操作(人们常常将视线范围内的机器人控制与远程操作混淆):我通过 VR 远程操作设置控制机械臂,没有佩戴任何设备,以改善人体工程学,但通过摄像头画面进行观察。通过 Wi-Fi 连接,模拟了 300 毫秒的延迟 + 10 毫秒的抖动(国际往返延迟,例如从英国到澳大利亚)。
右侧展示了纯远程操作的运行情况。忽略那些奇怪的“拖拽”动作,它们是我构建的拖放实现,允许操作员将人手臂重新定位到更有利的位置,而无需移动机械臂。经济实惠的远程操作的一些核心问题是 3D 空间感知能力下降、人-机器人具身差距以及差劲的力觉反馈。再加上网络延迟和有限的机器人硬件灵巧性,导致操作缓慢且精神疲惫。操作员通常采用类似于视频中的“观望”策略,以减少延迟和 3D 感知能力下降的影响。长时间进行远程操作是不切实际的。
左侧,AI 两次帮助操作员,以更高的速度维持长时间的操作。有一个“动作 AI”执行单个动作,例如抓取(目前的“动作 AI”是 VLA [视觉语言动作模型]、计算机视觉、运动规划、动态运动原语的混合;未来它将仅是 VLA),还有一个“人机环路 AI”,它动态地决定何时将控制权交给远程操作员或动作 AI。最终的运动是 AI 和操作员运动的融合,并根据环境和上下文因素进行一些动态加权。通过这种方式,操作员始终处于控制之中,可以处理 AI 无法处理的所有边缘情况,而 AI 则在已经有足够数据的子任务中完成大部分工作。
目前,它可以将有经验的远程操作员的速度提高 100-150%,对于没有经验的远程操作员来说,速度提升会更多。从最初的几次操作中就可以明显地感受到精神负担的减轻。一个重要的挑战是,在长时间的操作中,进一步提高速度,超越人类。从技术上讲,除了 AI 之外,还涉及改进机器人硬件、3D 远程呈现、网络优化、远程操作设计和人体工程学。
我认为这项工作是改善远程操作基础设施、扩大机器人数据收集和部署通用机器人的更大愿景的一部分。
关于我,我目前是英国应用机器人研发实验室 Createc 的 AI 负责人,我在那里构建了混合 AI 系统。同时也是 2 家初创公司的创始人(上一家是 AI-机器人公司,已退出)。
我发布这篇文章是为了尽早收集反馈。如果您觉得这令人兴奋或有用,我很乐意与您联系!我也欢迎早期阶段的合作。
查看原文
TLDR: I am using AI&more to make robotic teleoperation faster and sustainable over long periods, enabling large real robotic data collection for robotic foundational models.<p>We are probably 5-6 orders of magnitude short of the real robotic data we will need to train a foundational model for robotics, so how do we get that? I believe simulation or video can be a complement, but there is no substitution for a ton of real robotic data.<p>I’ve been exploring approaches to scale robotic teleoperation, traditionally relegated to slow high-value use cases (nuclear decommissioning, healthcare). Here’s a short video from a raw testing session (requires a lot of explanation!):<p><a href="https://youtu.be/QYJNJj8m8Hg" rel="nofollow">https://youtu.be/QYJNJj8m8Hg</a><p>What is happening here?<p>First of all, this is true robotic teleoperation (often people confuse controlling a robot in line-of-sight with teleoperation): I am controlling a robotic arm via a VR teleoperation setup without wearing it, to improve ergonomics, but watching at camera feeds. Over wifi, with a simulated 300ms latency + 10ms jitter (international round trip latency, say UK to Australia).<p>On the right a pure teleoperation run is shown. Disregard the weird “dragging” movements, they are a drag-and-drop implementation I built to allow the operator to reposition the human arm in a more favorable position without moving the robotic arm. Some of the core issues with affordable remote teleoperation are reduced spatial 3D awareness, human-robot embodiment gap, and poor force-tactile feedback. Combined with network latency and limited robotic hardware dexterity they result in slow and mentally draining operations. Often teleoperators employ a “wait and see” strategy similar to the video, to reduce the effects of latency and reduced 3D awareness. It’s impractical to teleoperate a robot for hour-long sessions.<p>On the left an AI helps the operator twice to sustain long sessions at a higher pace. There is an "action AI" executing individual actions such as picking (the “action AI” right now is a mixture of VLAs [Vision Language Action models], computer vision, motion planning, dynamic motion primitives; in the future it will be only VLAs) and a "human-in-the-loop AI", which is dynamically arbitrating when to give control to the teleoperator or to the action AI. The final movement is the fusion of the AI and the operator movement, with some dynamic weighting based on environmental and contextual factors. In this way the operator is always in control and can handle all the edge cases that the AI is not able to, while the AI does the lion share of the work in subtasks where enough data is already available.<p>Currently it can speed up experienced teleoperators by 100-150% and much more for inexperienced teleoperators. The reduction in mental workload is noticeable from the first few sessions. An important challenge is speeding up further vs a human over long sessions. Technically, besides AI, it’s about improving robotic hardware, 3D telepresence, network optimisation, teleoperation design and ergonomics.<p>I see this effort as part of a larger vision to improve teleoperation infra, scale up robotic data collection and deploy general purpose robots everywhere.<p>About me, I am currently head of AI in Createc, a UK applied robotic R&D lab, in which I built hybrid AI systems. Also 2x startup founder (last one was an AI-robotics exit).<p>I posted this to gather feedback early. I am keen to connect if you find this exciting or useful! I am also open to early stage partnerships.