{"id":39932,"date":"2025-07-22T11:26:24","date_gmt":"2025-07-22T03:26:24","guid":{"rendered":"https:\/\/www.1ai.net\/?p=39932"},"modified":"2025-07-22T11:26:24","modified_gmt":"2025-07-22T03:26:24","slug":"%e9%98%bf%e9%87%8c%e4%ba%91%e9%80%9a%e4%b9%89%e5%8d%83%e9%97%ae-qwen-3-%e6%97%97%e8%88%b0%e7%89%88%e6%a8%a1%e5%9e%8b%e5%ae%a3%e5%b8%83%e6%9b%b4%e6%96%b0%ef%bc%9a%e6%80%a7%e8%83%bd%e5%85%a8%e9%9d%a2","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/39932.html","title":{"rendered":"AliCloud Tongyi Qwen 3 Flagship Model Update: Comprehensive Performance Improvement, Surpassing Kimi, DeepSeek and Other Industry Top Levels"},"content":{"rendered":"<p>July 22nd.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e4%ba%91\" title=\"_Other Organiser\" target=\"_blank\" >Alibaba Cloud<\/a>The flagship Qwen3 model was updated today with an updated version of the Qwen3-235B-A22B-FP8 Non-thinking mode (Non-thinking) named Qwen3-235B-A22B-Instruct-2507-FP8.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-39933\" title=\"d32c2285j00szs6rl002vd000v900hkp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/07\/d32c2285j00szs6rl002vd000v900hkp.jpg\" alt=\"d32c2285j00szs6rl002vd000v900hkp\" width=\"1125\" height=\"632\" \/><\/p>\n<p>AliCloud said that after communication with the community and much deliberation, it decided to stop using the Hybrid Thinking model and switch to training the Instruct and Thinking models separately to get the best quality.<\/p>\n<p>It is reported that the new Qwen3 model has significantly improved general capabilities, including instruction following, logical reasoning, text comprehension, math, science, programming, and tool usage, and has performed well in many GQPA (knowledge), AIME25 (math), LiveCodeBench (programming), Arena-Hard (Human Preference Alignment), and BFCL (Agent Capability) The performance is outstanding in many measurements, outperforming top open-source models such as Kimi-K2 and DeepSeek-V3, and leading closed-source models such as Claude-Opus4-Non-thinking.<\/p>\n<p>Model overview<\/p>\n<p>The FP8 version of Qwen3-235B-A22B-Instruct-2507 has the following functional features:<\/p>\n<ul>\n<li>Type: Causal Language Modeling \/ Autoregressive Language Modeling<\/li>\n<li>Training phases: pre-training and post-training<\/li>\n<li>Number of participants: total 235B, activated 22B<\/li>\n<li>Number of parameters (not embedded): 234B<\/li>\n<li>Number of floors: 94<\/li>\n<li>Note the number of heads (GQA): 64 for Q and 4 for KV.<\/li>\n<li>Number of experts: 128<\/li>\n<li>Number of activation experts: 8<\/li>\n<li>Context length: 262,144 supported natively.<\/li>\n<\/ul>\n<p>The updated Qwen3 model, says Aliyun, also has the following key performance enhancements:<\/p>\n<ul>\n<li>Significant progress has been made in modeling long-tail knowledge coverage in multilingualism.<\/li>\n<li>In the subjective and open-ended tasks, the model significantly enhances its ability to fit user preferences, provide more useful responses, and generate higher quality text.<\/li>\n<li>Long text is upgraded to 256K, and contextual comprehension is further enhanced.<\/li>\n<\/ul>\n<p>The new Qwen3 model is now open source and updated on the Magic Ride community and HuggingFace, 1AI with the official address:<\/p>\n<ul>\n<li>Official website address: https:\/\/chat.qwen.ai\/<\/li>\n<li>HuggingFace: https:\/\/huggingface.co\/Qwen\/Qwen3-235B-A22B-Instruct-2507-FP8<\/li>\n<li>Magic Tower Community: https:\/\/modelscope.cn\/models\/Qwen\/Qwen3-235B-A22B-Instruct-2507-FP8<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>On July 22nd, Aliyun updated the flagship Qwen3 model with an updated version of Qwen3-235B-A22B-FP8 Non-Thining, named Qwen3-235B-A22B-Instract-2507-FP8. Aliyun indicated that, after communicating and considering with the community, it had been decided to discontinue the use of the mixed thinking model and move to training the Instract and Thinking models, respectively, to obtain the best quality. It was described that the new Qwen3 model had significantly improved its generic capability, including command compliance, logical reasoning, text understanding, mathematics, science, programming and use of tools, in GQPA (knowledge)<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[331,334],"collection":[],"class_list":["post-39932","post","type-post","status-publish","format-standard","hentry","category-news","tag-331","tag-334"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/39932","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=39932"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/39932\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=39932"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=39932"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=39932"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=39932"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}