{"id":12273,"date":"2024-06-05T09:48:08","date_gmt":"2024-06-05T01:48:08","guid":{"rendered":"https:\/\/www.1ai.net\/?p=12273"},"modified":"2024-06-05T09:48:08","modified_gmt":"2024-06-05T01:48:08","slug":"%e6%98%86%e4%bb%91%e4%b8%87%e7%bb%b4%e5%ae%a3%e5%b8%83%e5%bc%80%e6%ba%902%e5%8d%83%e4%ba%bf%e7%a8%80%e7%96%8f%e5%a4%a7%e6%a8%a1%e5%9e%8bskywork-moe-%e6%80%a7%e8%83%bd%e5%bc%ba%e5%8a%b2%e6%88%90","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/12273.html","title":{"rendered":"Kunlun Wanwei announces the open source of Skywork-MoE, a 200 billion sparse model with strong performance and lower cost"},"content":{"rendered":"<p>exist<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large models]]\" target=\"_blank\" >Large Model<\/a>Against the backdrop of rapid technological development,<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%98%86%e4%bb%91%e4%b8%87%e7%bb%b4\" title=\"[Sees articles with [Konlen] tags]\" target=\"_blank\" >Kunlun Wanwei<\/a>company<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>A landmark sparse large language model<a href=\"https:\/\/www.1ai.net\/en\/tag\/skywork-moe\" title=\"_Other Organiser\" target=\"_blank\" >Skywork-MoE<\/a>This model not only excels in performance, but also significantly reduces the inference cost, providing an effective solution to the challenges posed by large-scale intensive LLMs.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-12274\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/06\/6385310629247373086258082.png\" alt=\"\" width=\"621\" height=\"280\" \/><\/p>\n<p>Skywork-MoE model features:<\/p>\n<p>Open source and free for commercial use: Skywork-MoE&#039;s model weights and technical reports are completely open source and free for commercial use without application.<\/p>\n<p>Reduced inference cost: This model significantly reduces the inference cost while maintaining strong performance.<\/p>\n<p>Sparse Models: Skywork-MoE is a mixture of experts (MoE) model that provides a more economically viable alternative by distributing computation to specialized sub-models or \u201cexperts\u201d.<\/p>\n<p>Supports reasoning on a single 4090 server: It is the first open source MoE large model that supports reasoning on a single 4090 server.<\/p>\n<p>Technical details:<\/p>\n<p>Model weights and open source repository: Model weights can be downloaded from Hugging Face, and the open source repository is located on GitHub.<\/p>\n<p>code of reasoning: codes supporting the 8bit quantified loading of reasoning were provided on the 8x4090 server\u3002<\/p>\n<p>Performance: On the 8x4090 server, Skywork-MoE can achieve 2,200 tokens\/s throughput using the non-equilibrium Tensor Parallel method of parallel reasoning initiated by the Quinlon Manway team\u3002<\/p>\n<p>Model performance and technological innovation:<\/p>\n<p>Parameter size: The total parameter size of Skywork-MoE is 146B, the activation parameter size is 22B, there are 16 experts in total, and the size of each expert is 13B.<\/p>\n<p>Performance comparison: With the same number of activation parameters, Skywork-MoE is at the forefront of the industry, with a Dense model close to 70B and a nearly 3-fold reduction in inference cost.<\/p>\n<p>Training optimization algorithm: Skywork-MoE designs two training optimization algorithms, including Gating Logits normalization operation and adaptive Aux Loss, to solve the problems of difficult training and poor generalization performance of MoE models.<\/p>\n<p>Large-scale distributed training:<\/p>\n<p>Expert Data Parallel: A new parallel design scheme is proposed to efficiently partition the model when the number of experts is small.<\/p>\n<p>Non-uniform splitting and pipeline parallelism: A non-uniform pipeline parallel splitting and heavy calculation layer allocation method is proposed to make the computing\/graphics memory load more balanced.<\/p>\n<p>Experiments and Rules of Thumb:<\/p>\n<p>Scaling Law Experiment: Explores the constraints that affect the quality of Upcycling and From Scratch training MoE models.<\/p>\n<p>Training experience rule: If the FLOPs of training MoE model is more than 2 times that of training Dense model, it is better to choose From Scratch to train MoE; otherwise, choose Upcycling to train MoE to reduce training cost.<\/p>\n<p>The open sourcing of Skywork-MoE brings a powerful new tool to the large model community, helping to advance the field of artificial intelligence, especially in scenarios that require processing large amounts of data and where computational resources are limited.<\/p>\n<p>Project page: https:\/\/top.aibase.com\/tool\/skywork-moe<\/p>\n<p>Model download address: https:\/\/huggingface.co\/Skywork\/Skywork-MoE-Base<\/p>","protected":false},"excerpt":{"rendered":"<p>In the context of the rapid development of large modeling technology, KunlunWei has open-sourced a landmark sparse large language model, Skywork-MoE, which not only excels in performance, but also drastically reduces inference costs, providing an effective solution to the challenges posed by large-scale intensive LLM. Skywork-MoE model features: Open source and free commercialization: Skywork-MoE's model weights, technical reports are completely open source and free commercialization without application. Reduced inference cost: The model significantly reduces inference cost while maintaining strong performance. Sparse Model:Skywork-MoE is a Mixed Model of Expertise (MoE), which provides a sparse model by assigning computations to specialized sub-models or \"experts\".<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[2926,216,219,1050],"collection":[],"class_list":["post-12273","post","type-post","status-publish","format-standard","hentry","category-news","tag-skywork-moe","tag-216","tag-219","tag-1050"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/12273","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=12273"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/12273\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=12273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=12273"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=12273"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=12273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}