{"id":19940,"date":"2024-09-14T09:42:20","date_gmt":"2024-09-14T01:42:20","guid":{"rendered":"https:\/\/www.1ai.net\/?p=19940"},"modified":"2024-09-14T09:42:20","modified_gmt":"2024-09-14T01:42:20","slug":"%e5%85%83%e8%b1%a1%e5%8f%91%e5%b8%83%e4%b8%ad%e5%9b%bd%e6%9c%80%e5%a4%a7-moe-%e5%bc%80%e6%ba%90%e5%a4%a7%e6%a8%a1%e5%9e%8b%ef%bc%9a%e6%80%bb%e5%8f%82%e6%95%b0-255b%ef%bc%8c%e6%bf%80%e6%b4%bb%e5%8f%82","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/19940.html","title":{"rendered":"Yuanxiang Releases China's Largest MoE Open Source Large Model: 255B Total Parameters, 36B Activation Parameters"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%85%83%e8%b1%a1\" title=\"[Sees articles with [earth] labels]\" target=\"_blank\" >Yuanxiang<\/a> XVERSE Released<strong>China's largest MoE open source model<\/strong><strong>\u00a0<\/strong><strong>XVERSE-MoE-A36B<\/strong>.<\/p>\n<p>The model has 255B total parameters and 36B activation parameters, and the official claim is that the effect can \"roughly reach\" more than 100B large model \"cross-level\" performance leap, while the training time is reduced by 30%, and the inference performance is improved by 100%, which makes the cost per token drop dramatically. At the same time, the training time is reduced by 30%, the inference performance is improved by 100%, and the cost per token is greatly reduced.<\/p>\n<p>MoE (Mixture of Experts) hybrid expert modeling architecture, combining multiple segmented domain expert models into a single super-model in<strong>Scale up models while keeping model performance maximized, and even reduce the computational cost of training and inference<\/strong>MoE has been used in a number of big models. Big models like Google Gemini-1.5, OpenAI's GPT-4, and Musk's xAI's Grok all use MoE.<\/p>\n<p>In several reviews, Meta-Elephant MoE outperforms several similar models, including Skywork-MoE, a 100 billion MoE model in China, Mixtral-8x22B, a traditional MoE dominator, and Grok-1-A86B, a 314 billion parameter MoE open-source model, and so on.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-19941\" title=\"63f3ddc3j00sjs4oo007ed000u000f1m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/09\/63f3ddc3j00sjs4oo007ed000u000f1m.jpg\" alt=\"63f3ddc3j00sjs4oo007ed000u000f1m\" width=\"1080\" height=\"541\" \/><\/p>\n<p>Attached related links:<\/p>\n<ul>\n<li>Hugging Face: https:\/\/huggingface.co\/xverse\/XVERSE-MoE-A36B<\/li>\n<li>Magic Hitch: https:\/\/modelscope.cn\/models\/xverse\/XVERSE-MoE-A36B<\/li>\n<li>Github: https:\/\/github.com\/xverse-ai\/XVERSE-MoE-A36B<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>XVERSE released XVERSE-MoE-A36B, the largest MoE open source model in China. The model has 255B total parameters and 36B activation parameters, and it is officially claimed that the effect can \"roughly reach\" more than the \"cross-level\" performance leap of the 100B large model, while the training time is reduced by 30%, and the inference performance is improved by 100%. \"At the same time, the training time is reduced by 30%, the inference performance is improved by 100%, and the cost per token is greatly reduced. MoE (Mixture of Experts) mixed expert model architecture, combining multiple segmented domain expert models into a super model, while expanding the model scale, keeping the model performance maximized, and even reducing the computational cost of training and reasoning. Google Gemini-1.5, OpenAI's GPT-4<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[4352,1936,391],"collection":[],"class_list":["post-19940","post","type-post","status-publish","format-standard","hentry","category-news","tag-xverse","tag-1936","tag-391"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/19940","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=19940"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/19940\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=19940"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=19940"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=19940"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=19940"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}