原生并行推理器:自我演进以学习并行推理

1作者: jacklanda19 天前
大家好! 我们提出了一个并行推理器,旨在减轻推理中的幻觉并增强公平性。 欢迎在Hugging Face上投票支持! [推特帖子] https://x.com/ZilongZheng/status/1998252267783516444 [项目页面 & 演示] https://bigai-nlco.github.io/Native-Parallel-Reasoner [GitHub 仓库] https://github.com/bigai-nlco/Native-Parallel-Reasoner [HF 论文] https://huggingface.co/papers/2512.07461 [arXiv 预印本] https://arxiv.org/abs/2512.07461 亮点:该模型是一个原生并行推理系统PoC,与其它流行的多智能体方法(具有多个推理路径)不同。它没有多个智能体,而是在同一时间片内由单个智能体进行多路径推理。训练从一个没有外部监督的单串行模型开始,使用自蒸馏生成合成轨迹,然后通过模仿学习和RL + SGLang基础设施进行多阶段优化,以加速并行思维。它使用原生并行性进行思考和解决问题。通过最少的自蒸馏并行推理轨迹样本,它在多个数学和复杂推理基准测试中与现有的并行和自回归推理基线持平或略有超越。在推理方面,它可以实现高达4.6倍的挂钟时间加速。在物理上,它实现了约100%的并行触发。在逻辑上,它表现出涌现的问题分解和分治能力,内化了并行思维,而不是退回到串行策略。
查看原文
Hey there!<p>We propose a parallel reasoner, aiming at mitigating the hallucination and enhance fairness in reasoning.<p>Welcome to vote it on Hugging Face !<p>[Twitter Post] https:&#x2F;&#x2F;x.com&#x2F;ZilongZheng&#x2F;status&#x2F;1998252267783516444<p>[Project Page &amp; Demo] https:&#x2F;&#x2F;bigai-nlco.github.io&#x2F;Native-Parallel-Reasoner<p>[GitHub Repo] https:&#x2F;&#x2F;github.com&#x2F;bigai-nlco&#x2F;Native-Parallel-Reasoner<p>[HF Paper] https:&#x2F;&#x2F;huggingface.co&#x2F;papers&#x2F;2512.07461<p>[arXiv Preprint] https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2512.07461<p>Highlight: This model is a native parallel reasoning system PoC that differs from other popular Multi-Agent approaches with multiple reasoning paths. Instead of having multiple agents, it has a single agent doing multi-path reasoning within the same time slice. Training starts from a single serial model with zero external supervision, using self-distillation to generate synthetic trajectories, then multi-stage optimization through imitation learning and RL + SGLang infrastructure adapted for parallel thinking acceleration. It thinks and solves problems using naive parallelism. With minimal self-distilled parallel reasoning trajectory samples, it matches and slightly exceeds existing parallel and autoregressive reasoning baselines on several math and complex reasoning benchmarks. On the inference side, it achieves up to 4.6x wall-clock speedup. Physically, it achieves ~100% parallel triggering. Logically, it exhibits emergent problem decomposition and divide-and-conquer capabilities, internalizing parallel thinking rather than falling back to serial strategies.