一个参数量为 2700 万的模型在推理任务上超越了大型语言模型

4作者: SteadySurfdom7 个月前
我看到了这篇关于 HRM(分层推理模型)的解释文章,其中提到了一些惊人的数据。文章声称,在数独、30x30 迷宫和 ARC-AGI 等推理任务中,HRM 优于 3.7 sonnet 和 o3-mini 等大型语言模型。这是该解释文章的链接:https://towardsdatascience.com/your-next-large-language-model-might-not-be-large-afterall-2/ 我现在在一家产品导向的初创公司工作,负责自动化 PCB 设计。这项工作也需要一些硬核推理能力,比如知道 USB 连接器应该位于电路板边缘,感性和容性负载应该分开,同时优化布线长度。 我想问一下,这是否是一个解决我用例的可行方法?您认为这可行吗?因为我确实看到 HRM 解决的问题与我的用例之间存在一些相似之处,而且 HRM 在解决这些问题上比大型语言模型更胜一筹。
查看原文
I came across this HRM (Hierarchical Reasoning Models) explainer, and it claimed some wild numbers. It claims to beat LLMs like 3.7 sonnet and o3-mini on reasoning tasks like Sudoku, 30X30 mazes, and ARC-AGI. Here is the explainer: https:&#x2F;&#x2F;towardsdatascience.com&#x2F;your-next-large-language-model-might-not-be-large-afterall-2&#x2F;<p>I am currently working in a product-based startup and am working on automating PCB design. It also requires some hardcore reasoning, as in knowing that USB connectors should be at the edge of the board, inductive and capacitive loads should be apart, all while optimizing the routing length.<p>I wanted to ask if this is a viable approach to solving my use case? Think this would work out? Because I did see some similarities with the problems HRM solves better than LLMs, and my use case.