{"id":28675,"date":"2025-02-12T20:44:56","date_gmt":"2025-02-12T12:44:56","guid":{"rendered":"https:\/\/www.1ai.net\/?p=28675"},"modified":"2025-02-12T20:44:56","modified_gmt":"2025-02-12T12:44:56","slug":"%e8%b1%86%e5%8c%85%e6%8f%90%e5%87%ba%e5%85%a8%e6%96%b0%e7%a8%80%e7%96%8f%e6%a8%a1%e5%9e%8b%e6%9e%b6%e6%9e%84-ultramem%ef%bc%8c%e6%8e%a8%e7%90%86%e6%88%90%e6%9c%ac%e8%be%83-moe-%e6%9c%80%e9%ab%98","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/28675.html","title":{"rendered":"Beanbag proposes a new sparse model architecture, UltraMem, which reduces inference cost by up to 83% compared to MoE."},"content":{"rendered":"<p>February 12 news.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%b1%86%e5%8c%85\" title=\"[View articles tagged with [beanbag]]\" target=\"_blank\" >Bean curd<\/a>The Big Model team announced today that<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%ad%97%e8%8a%82%e8%b7%b3%e5%8a%a8\" title=\"[View articles tagged with [bytejump]]\" target=\"_blank\" >ByteDance<\/a>The Beanbag Big Modeling team presents<strong>New sparse modeling architecture <a href=\"https:\/\/www.1ai.net\/en\/tag\/ultramem\" title=\"_Other Organiser\" target=\"_blank\" >UltraMem<\/a><\/strong>This architecture effectively solves the problem of MoE inference.<strong>High number of visits and deposits<\/strong>The inference speed is higher than that of the MoE architecture<strong>Upgrade 2-6 times<\/strong>Reasoning Costs<strong>Up to 83%<\/strong>The study also reveals the Scaling Law of the new architecture. The study also reveals the Scaling Law of the new architecture, demonstrating that it not only has excellent Scaling characteristics, but also outperforms MoE in terms of performance.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-28676\" title=\"e046b367j00srkm1p005od000u000e8p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/02\/e046b367j00srkm1p005od000u000e8p.jpg\" alt=\"e046b367j00srkm1p005od000u000e8p\" width=\"1080\" height=\"512\" \/><\/p>\n<p>Experimental results show that training UltraMem models with 20 million values can achieve industry-leading inference speed and model performance with the same computing resources, opening up a new path to building billions of values or experts.<\/p>\n<p>UltraMem is described as a sparse modeling architecture that also decouples computation and parameters, ensuring modeling effectiveness while<strong>Solves the visiting problem of inference<\/strong>. The experimental results show that with the same parameters and activation conditions, UltraMem\u00a0<strong>Beyond MoE in modeling effects<\/strong>, and will reason about the speed of<strong>Upgraded 2-6 times<\/strong>In addition, at common batch size scales, UltraMem's access cost is almost equal to that of a Dense model with the same amount of computation. In addition, at common batch size scales, UltraMem's access cost is almost comparable to that of a Dense model with the same amount of computation.<\/p>\n<p>Under the Transformer architecture, the performance of a model is logarithmically related to its number of parameters and computational complexity. As the size of the LLM continues to grow, the inference cost increases dramatically and slows down.<\/p>\n<p>Although the MoE architecture has successfully decoupled computation and parameters, a smaller batch size activates all experts at inference time, leading to a sharp rise in accesses and consequently a significant increase in inference latency.<\/p>\n<p>Note: \"MoE\" refers to the Mixture of Experts architecture, which is a framework for<strong>Architectural design to improve model performance and efficiency<\/strong>. In the MoE architecture, the model consists of<strong>Composed of multiple sub-models (experts)<\/strong>, each expert is responsible for processing a portion of the input data. During the training and inference process, depending on the characteristics of the input data, it will be<strong>Selective activation of some experts to perform calculations<\/strong>, thus decoupling computation and parameters and improving the flexibility and efficiency of the model.<\/p>","protected":false},"excerpt":{"rendered":"<p>On February 12, 2011, the Beanbag Big Model team announced that the ByteDance Beanbag Big Model team has proposed a new sparse model architecture, UltraMem, which effectively solves the problem of high access memory in MoE inference, improves inference speed by 2 to 6 times compared with the MoE architecture, and reduces the cost of inference by up to 83%. The study also reveals the Scaling Law of the new architecture, proving that it not only has excellent Scaling characteristics, but also surpasses MoE in terms of performance. The study also reveals the Scaling Law of the new architecture, which proves that it not only has excellent scaling characteristics, but also outperforms MoE. Experimental results show that training UltraMem models with 20 million values can achieve industry-leading inference speed and model performance with the same computing resources, which is an ideal solution for constructing multi-billion value or<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[5701,548,2248],"collection":[],"class_list":["post-28675","post","type-post","status-publish","format-standard","hentry","category-news","tag-ultramem","tag-548","tag-2248"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/28675","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=28675"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/28675\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=28675"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=28675"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=28675"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=28675"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}