{"id":27923,"date":"2025-01-29T15:06:26","date_gmt":"2025-01-29T07:06:26","guid":{"rendered":"https:\/\/www.1ai.net\/?p=27923"},"modified":"2025-01-29T15:06:26","modified_gmt":"2025-01-29T07:06:26","slug":"%e9%80%9a%e4%b9%89%e5%8d%83%e9%97%ae-qwen-2-5-max-%e8%b6%85%e5%a4%a7%e8%a7%84%e6%a8%a1-moe-%e6%a8%a1%e5%9e%8b%e5%8f%b7%e7%a7%b0%e4%bc%98%e4%ba%8e-deepseek-v3-%e7%ad%89%e7%ab%9e%e5%93%81%ef%bc%8c","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/27923.html","title":{"rendered":"The Qwen 2.5-Max hyperscale MoE model is claimed to be better than Deepseek V3 and other competitors, and has not been open-sourced for the time being."},"content":{"rendered":"<p>January 29, 2011 - On New Year's Day<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e4%ba%91\" title=\"_Other Organiser\" target=\"_blank\" >Alibaba Cloud<\/a>unveiled its new<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%80%9a%e4%b9%89%e5%8d%83%e9%97%ae\" title=\"[View articles tagged with [Tongyi Thousand Questions]]\" target=\"_blank\" >Thousand Questions on Tongyi<\/a> Qwen 2.5-Max Ultra Large Scale <a href=\"https:\/\/www.1ai.net\/en\/tag\/moe%e6%a8%a1%e5%9e%8b\" title=\"[See articles with [MoE Model] labels]\" target=\"_blank\" >MoE model<\/a>The Qwen Chat is accessible via an API, or you can log in to Qwen Chat and experience it, for example, by talking directly to the models or using artifacts, searching, and other features.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-27924\" title=\"55787359j00squ8yw00cpd000v900dap\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/55787359j00squ8yw00cpd000v900dap.jpg\" alt=\"55787359j00squ8yw00cpd000v900dap\" width=\"1125\" height=\"478\" \/><\/p>\n<p>According to the introduction, Tongyi Qwen 2.5-Max uses over 20 trillion tokens of pre-training data and well-designed post-training scheme for training.<\/p>\n<p><strong>performance<\/strong><\/p>\n<p>AliCloud directly compared the performance of the command model (Note: the command model is the model we normally use that can directly talk to each other). The comparison includes DeepSeek V3, GPT-4o and Claude-3.5-Sonnet, and the results are as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-27926\" title=\"9b12780fj00squ90t003od000v900hrp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/9b12780fj00squ90t003od000v900hrp.jpg\" alt=\"9b12780fj00squ90t003od000v900hrp\" width=\"1125\" height=\"639\" \/><\/p>\n<p>Qwen2.5-Max outperforms DeepSeek V3 in benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, and GPQA-Diamond, while also demonstrating competitive scores in other evaluations such as MMLU-Pro.<\/p>\n<p>In the base model comparison, due to the inability to access the base models of closed-source models such as GPT-4o and Claude-3.5-Sonnet, AliCloud compares Qwen2.5-Max with DeepSeek V3, the current leading open-source MoE model, Llama-3.1-405B, the largest open-source dense model, and the open-source dense model, which is also ranked at the top of the open-source dense models, with Qwen2.5-72B, which is also one of the top open source dense models. The comparison results are shown in the figure below:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-27925\" title=\"984bbaf7j00squ901003td000v900hkp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/01\/984bbaf7j00squ901003td000v900hkp.jpg\" alt=\"984bbaf7j00squ901003td000v900hkp\" width=\"1125\" height=\"632\" \/><\/p>\n<p>Our base model has shown significant benefits in most benchmarks. We believe that the next version of Qwen2.5-Max will reach even higher levels as post-training techniques continue to improve.<\/p>","protected":false},"excerpt":{"rendered":"<p>January 29, 2011 - On the occasion of the New Year, AliCloud announced its new Qwen 2.5-Max hyperscale MoE model, which can be accessed by way of APIs, or you can log in to Qwen Chat to experience it, for example, by talking to the model directly, or using artifacts, search and other functions. According to the introduction, Tongyi Qianqian Qwen 2.5-Max is trained using pre-training data of more than 20 trillion tokens and a well-designed post-training program. Performance Aliyun directly compared the performance of the command model (note: the command model is the model we usually use to have direct conversations). Comparison objects include DeepSeek V3, GPT-4o<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[5653,331,334],"collection":[],"class_list":["post-27923","post","type-post","status-publish","format-standard","hentry","category-news","tag-moe","tag-331","tag-334"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/27923","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=27923"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/27923\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=27923"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=27923"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=27923"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=27923"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}