Ari released the strongest of the big AI model, Qwen3-Max, with a full lead in performance

On September 24th, following the release of the Qwen3-2507 seriesAlibaba CloudToday, the launch of Qwen3-Max, the largest and most powerful language model of the general team to date, was announced。

Ari released the strongest of the big AI model, Qwen3-Max, with a full lead in performance

The Qwen3-Max-Instruct official version further upgrades coding and intelligence capabilities, reaching industry-led standards in comprehensive benchmark tests that cover knowledge, reasoning, programming, command compliance, human preferences, intelligent mission and multilingual understanding。

The general team indicated that Qwen3-Max-Thinking, still in training, had demonstrated extraordinary potential and that it was expected to be officially available to the public in the near future. It was described that the “thinking” version had achieved 100% accuracy in the difficult reasoning benchmark tests of AIME 25, HMT, etc., in conjunction with the use of the tool and the calculation of resources in the additional tests。

1AI Attach official address:

  • QwenChat:i don't know
  • Ali Yunpun:https://help.aliyun.com/zh/model-studio/models#qwen-max-cn-bj

According to Ali, the total parameters of the Qwen3-Max model exceed 1T, using 36T tokens for pre-training. The model structure follows the model structure design paradigm of the Qwen3 series, using the global-batch load balancing loss。

  • Training stability:Thanks to Qwen3 MoE model structure design, Qwen3-Max pre-trained ross stable smooth. The training process was well organized, without any loss of struts or adjustment strategies such as training retreats and changes in data distribution。
  • Training efficiency:Qwen3-Max-Base training efficiency has improved significantly, with MFUs increasing 30% relative to Qwen2.5-Max-Base, at the optimization of PAI-FlashMoE ' s high-efficiency multi-level water flow parallel strategy. In the long-sequence training scene, the further use of the ChunkFlow strategy resulted in a three-fold gain in throughput compared to the parallel sequence programme, supporting Qwen3-Max 1M training in the long context. At the same time, the time lost by Qwen3-Max due to hardware failure in super-large clusters fell to one fifth of Qwen2.5-Max by a variety of means, including SanityCheck, EasyCheckpoint, and dispatch chain optimization。

It is described that the preview version of Qwen3-Max-Instruct has been placed at the top of the LMArena list (over GPT-5-Chat). The official release further enhanced its capabilities, particularly in terms of code generation and intelligent body performance。

1AI noted that Qwen3-Max-Instruct had excellent results of up to 69.6 points in a benchmark test focused on addressing real-life programming challenges, and was among the top global models。

In addition, Qwen3-Max-Instruct achieved breakthrough performance at 74.8 points above Claude Opus 4 and DeepSeek-V3.1 in a tough benchmark for assessing the utility of smart tools。

The enhanced version of Qwen3-Max ' s reasoning - Qwen3-Max-Thinking - demonstrates an unprecedented ability to reason by means of integrated code interpretors and by using computational techniques in parallel tests, especially with regard to the extremely challenging mathematical reasoning benchmark tests AIME 25 and HMT。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Luma CEO: Hollywood is dead and only AI can save

2025-9-23 11:56:02

Information

An American lawyer was fined $10,000 for using ChatGPT to produce false legal quotations

2025-9-24 11:59:46

Search