{"id":42939,"date":"2025-09-12T11:50:29","date_gmt":"2025-09-12T03:50:29","guid":{"rendered":"https:\/\/www.1ai.net\/?p=42939"},"modified":"2025-09-12T11:50:29","modified_gmt":"2025-09-12T03:50:29","slug":"%e9%98%bf%e9%87%8c%e4%ba%91%e5%8f%91%e5%b8%83%e9%80%9a%e4%b9%89-qwen3-next-%e5%9f%ba%e7%a1%80%e6%a8%a1%e5%9e%8b%e6%9e%b6%e6%9e%84%ef%bc%8c%e5%bc%80%e6%ba%9080b-a3b%e7%b3%bb%e5%88%97%e6%a8%a1%e5%9e%8b","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/42939.html","title":{"rendered":"Aliyun release generic Qwen3-Next basic model structure, open source 80B-A3B series model"},"content":{"rendered":"<p>Early this morning, Ali Tunyi announced the next generation<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%9f%ba%e7%a1%80%e6%a8%a1%e5%9e%8b%e6%9e%b6%e6%9e%84\" title=\"[Sees articles with labels]\" target=\"_blank\" >Basic model structure<\/a> Qwen3-Next, and<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>Qwen3-Next-80B-A3B models based on this architecture\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-42940\" title=\"62ec15bej00t2gil0002pd000u00gwm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/09\/62ec15bej00t2gil0002pd000u000gwm.jpg\" alt=\"62ec15bej00t2gil0002pd000u00gwm\" width=\"1080\" height=\"608\" \/><\/p>\n<p>According to official sources, it is believed that Context Length Scaling and Total Parameter Scaling are two major trends in the development of large models in the future. In order to further enhance the training and reasoning efficiency of the model under the long context and large-scale general parameters, a new Qwen3-Next model structure was designed\u3002<\/p>\n<p>It was described that the following core improvements had been made to the MoE model structure of Qwen3-Next compared to the MoE model structure of Qwen3: a mixed attention mechanism, a high-sortitude MoE structure, a set of trainings to optimize stable and friendly, and multiple token forecasting mechanisms to improve the efficiency of reasoning\u3002<\/p>\n<p>A model structure based on Qwen3-Next has also trained the Qwen3-Next-80B-A3B-Base model. The model has 80 billion parameters that only activate 3 billion and achieves even slightly better performances than the Qwen3-32B dense model\u3002<\/p>\n<p>The Qwen3-Next-80B-A3B-Base training costs (GPU hours) are less than one tenth of Qwen3-32B, and the reasoning in the context above 32k is more than ten times more than Qwen3-32B, which achieves an excellent ratio of training and reasoning\u3002<\/p>\n<p>At the same time, the general meaning is based on the Qwen3-Next-80B-A3B-Base type, the Qwen3-Next-80B-A3B-Instract and\u00a0<strong>Qwen3-Next-80B-A3B-Thinking:<\/strong><\/p>\n<p>Qwen3-Next-80B-A3B-Instruct acts in the same way as the flagship model Qwen3-235B-A22B-Instruct-2507, while showing a significant advantage in the super-long context of 256K<\/p>\n<p>Qwen3-Next-80B-A3B-Thinking performed well in complex reasoning tasks, not only better than Qwen3-30B-A3B-Thinking-2507 and Qwen3-32B-Thinking, but also beyond closed-source model Gemini-2.5-Flash-Thinking in a number of benchmark tests, some of the key indicators are already approaching Qwen3-235B-A22B-Thinking-2507\u3002<\/p>\n<p>The new model is now online\u3002<\/p>\n<p>free experience: https:\/\/chat.qwen.ai\/<\/p>\n<p>Qwen3-Next-c314f23bd0264a<\/p>\n<p>HuggingFace: https:\/\/huggingface.co\/collections\/Qwen\/qwen3-next-68c25fd6838e585db8eea9d<\/p>\n<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e4%ba%91\" title=\"_Other Organiser\" target=\"_blank\" >Alibaba Cloud<\/a>Refinement: https:\/\/help.aliyun.com\/zh\/model-studio\/models#2c9c4628c9yd<\/p>","protected":false},"excerpt":{"rendered":"<p>In the early hours of the morning, Alithong released the next generation of basic model structures, Qwen3-Next, and opened up the Qwen3-Next-80B-A3B series based on the architecture. According to official sources, it is believed that Context Length Scaling and Total Parameter Scaling are two major trends in the development of large models in the future. In order to further enhance the training and reasoning efficiency of the model under long context and large-scale general parameters, a new Qwen3-Next model structure was designed. It was described that the MoE model structure of Qwen3-Next, compared to Qwen3, had the following core improvements: M.M<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[7588,219,334],"collection":[],"class_list":["post-42939","post","type-post","status-publish","format-standard","hentry","category-news","tag-7588","tag-219","tag-334"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/42939","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=42939"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/42939\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=42939"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=42939"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=42939"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=42939"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}