通义千问3.5-Max-思考:36万亿 tokens
2 分•作者: SilasYee•19 天前
阿里巴巴正式发布了 Qwen3-Max-Thinking,这是一个基于 36 万亿 tokens 预训练的万亿参数 MoE 旗舰大语言模型,其训练数据量是 Qwen 2.5 的两倍,并且在 19 项权威基准测试中,其表现已经与 GPT-5.2-Thinking、Claude-Opus-4.5 和 Gemini 3 Pro 等顶级模型持平甚至超越。它真正脱颖而出的关键在于其两大核心技术突破。
首先,自适应工具调用:无需手动提示,它就能根据任务需求自主调用搜索引擎、记忆工具和代码解释器。这减少了幻觉,并提升了实时问题解决能力;例如,编码任务会触发自动的错误修正循环,而研究任务则结合了搜索和上下文合成。其次,测试时缩放 (TTS):它通过迭代洞察来优化推理,从而超越了标准的并行采样,并在关键基准测试中实现了可衡量的提升——GPQA 从 90.3 提升至 92.8,LiveCodeBench v6 从 88.0 提升至 91.4,IMO-AnswerBench 从 89.5 提升至 91.5。
值得注意的是,其预览版甚至在 AIME 25 和 HMMT 25 等高难度数学竞赛中实现了 100% 的准确率。该模型在网页/桌面演示中运行流畅,其 API 已可用于生产环境,并具有可调节的思考预算(默认高达 8 万 tokens),以平衡深度和速度。这不仅仅是一次增量更新——这是一次飞跃,缩小了其在推理和工具集成方面的差距,从而更好地服务于现实世界的学术和工程任务。
了解更多:https://chat.qwen.ai/
查看原文
Alibaba has officially launched Qwen3-Max-Thinking, a trillion-parameter MoE flagship LLM pretrained on 36T tokens—double the corpus of Qwen 2.5—and it’s already matching or outperforming top-tier models like GPT-5.2-Thinking, Claude-Opus-4.5, and Gemini 3 Pro across 19 authoritative benchmarks. Its two core technical breakthroughs are what truly set it apart.<p>First, Adaptive Tool Calling: No manual prompts are needed—it autonomously invokes search engines, memory tools, and code interpreters based on task demands. This cuts down on hallucinations and boosts real-time problem-solving; for instance, coding tasks trigger automatic error correction loops, while research tasks combine search with context synthesis. Second, Test-Time Scaling (TTS): It outperforms standard parallel sampling by refining reasoning through iterative insights, with measurable jumps in key benchmarks—GPQA rose from 90.3 to 92.8, LiveCodeBench v6 hit 91.4 from 88.0, and IMO-AnswerBench climbed to 91.5 from 89.5.<p>Notably, its preview version even achieved 100% accuracy in tough math contests like AIME 25 and HMMT 25. The model runs smoothly on web/desktop demos, and its API is production-ready with adjustable thinking budgets (up to 80K tokens by default) to balance depth and speed. This isn’t just an incremental update—it’s a leap that closes the gap in reasoning and tool integration for real-world academic and engineering tasks.<p>Check it out: https://chat.qwen.ai/