Waypoint 1.1:用于交互式模拟的本地优先世界模型
6 分•作者: lcastricato•5 天前
在过去的几周里,世界模型开始首次展现出真实感。你可以看到连贯的环境、长时间的演变过程以及越来越逼真的视觉效果。与此同时,大多数此类系统难以运行、难以集成,并且为了规模而牺牲了交互性。
我们创建 Overworld 的初衷是,我们更关注于构建你可以真正进入的世界,而不是制作令人印象深刻的视频。这意味着低延迟、持续控制,以及每次你行动时都会响应的系统,而不是每条提示只响应一次。
上周,我们发布了 Waypoint 1,这是一个实时扩散世界模型的研发预览版,可在本地运行。下周,我们将发布 Waypoint 1.1 Small,它设计用于在现代消费级 GPU 上运行,并且易于构建和修改。
Waypoint 从头开始构建,而不是从大型视频模型进行微调。我们针对控制频率、稀疏注意力以及快速推理进行了大量优化,以便系统能够保持持久的世界状态,并以游戏级别的帧率响应输入。我们的目标是让开发者今天就可以集成它,而不仅仅是观看演示。
我们认为,一旦世界模型遵循与大型语言模型(LLM)类似的路径,即本地执行、开放工具和快速的社区驱动迭代,这个领域的发展速度将会最快。Genie 和类似的系统展示了大规模的可能性。我们的重点是让这个未来变得本地化和可访问。
我们在一篇最近的博文中详细阐述了“沉浸感差距”,为什么交互性比单纯的视觉效果更重要,以及我们如何优化模型。
代码、演示和发布详情请见:https://over.world/blog/the-immersion-gap
查看原文
Over the last few weeks, world models have started to feel real for the first time. You can see coherent environments, long rollouts, and increasingly convincing visuals. At the same time, most of these systems are hard to run, hard to integrate, and trade interactivity for scale.<p>We started Overworld because we cared less about producing impressive videos and more about building worlds you can actually inhabit. That means low latency, continuous control, and systems that respond every time you act, not once per prompt.<p>Last week, we released Waypoint 1, a research preview of a real-time diffusion world model that runs locally. Next week, we’re releasing Waypoint 1.1 Small, which is designed to run on modern consumer GPUs and be easy to build on and modify.<p>Waypoint is built from scratch rather than fine-tuned from a large video model. We optimized heavily for control frequency, sparse attention, and fast inference so the system can maintain a persistent world state and respond to input at game-level frame rates. The goal was to make something developers can integrate today, not just watch as a demo.<p>We think this space will move fastest once world models follow a path similar to LLMs: local execution, open tooling, and fast community-driven iteration. Genie and similar systems show what’s possible at a massive scale. Our focus has been on making that future local and accessible.<p>We wrote more about the “immersion gap,” why interactivity matters more than visuals alone, and how we optimized the model in a recent blog post.<p>Code, demos, and release details are here: https://over.world/blog/the-immersion-gap