HackerNews中文版

原帖作者。大多数用于解决 TSP（旅行商问题）的深度学习方法都依赖于使用大规模数据集进行预训练。我想看看一个求解器是否可以在没有任何来自其他问题的先验知识的情况下，针对特定实例“即时”学习。我使用 PPO 构建了一个求解器，它针对每个实例从头开始学习。它在单个 A100 上，大约 5.6 小时内，在 TSPLIB d1291 上实现了 1.66% 的差距。核心思想：我的假设是，虽然最优解主要由“最小边”（最近邻）组成，但实际的难点在于少数几个超出局部范围的“例外边”。我没有进行预训练，而是设计了一种基于这些例外边的拓扑/几何结构的归纳偏置。智能体会根据微观/宏观结构接收关于哪些边可能具有前景的指导，而 PPO 则通过反复试验来填补空白。看到强化学习在没有数据集的情况下达到这种水平，这很有趣。我已经开源了代码和一个 Colab 笔记本，供任何想验证结果或尝试“例外边”假设的人使用。代码和 Colab：[https://github.com/jivaprime/TSP_exception-edge](https://github.com/jivaprime/TSP_exception-edge) 很乐意回答关于几何先验或 PPO 实现的任何问题！

查看原文

OP here.Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance without any priors from other problems.I built a solver using PPO that learns from scratch per instance. It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100.The Core Idea: My hypothesis was that while optimal solutions are mostly composed of 'minimum edges' (nearest neighbors), the actual difficulty comes from a small number of 'exception edges' outside of that local scope.Instead of pre-training, I designed an inductive bias based on the topological/geometric structure of these exception edges. The agent receives guides on which edges are likely promising based on micro/macro structures, and PPO fills in the gaps through trial and error.It is interesting to see RL reach this level without a dataset. I have open-sourced the code and a Colab notebook for anyone who wants to verify the results or tinker with the 'exception edge' hypothesis.Code & Colab: <a href="https://github.com/jivaprime/TSP_exception-edge" rel="nofollow">https://github.com/jivaprime/TSP_exception-edge</a>Happy to answer any questions about the geometric priors or the PPO implementation!

Show HN: 无需预训练的单实例 TSP 求解器（在 d1291 上达到 1.66% 的差距）