{"id":18916,"date":"2024-08-30T09:28:43","date_gmt":"2024-08-30T01:28:43","guid":{"rendered":"https:\/\/www.1ai.net\/?p=18916"},"modified":"2024-08-30T09:28:43","modified_gmt":"2024-08-30T01:28:43","slug":"%e6%9c%80%e5%bc%ba%e7%ab%af%e4%be%a7%e5%bc%80%e6%ba%90-ai%e6%a8%a1%e5%9e%8b-zamba2-mini-%e7%99%bb%e5%9c%ba%ef%bc%9a12-%e4%ba%bf%e5%8f%82%e6%95%b0%ef%bc%8c4bit-%e9%87%8f%e5%8c%96%e4%b8%8b%e5%86%85","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/18916.html","title":{"rendered":"The most powerful open source AI model Zamba2-mini is released: 1.2 billion parameters, less than 700MB memory usage at 4-bit quantization"},"content":{"rendered":"<p>Zyphra published a blog post on August 27, announcing the release of the Zamba2-mini 1.2B model.<strong>It has a total of 1.2 billion parameters and is claimed to be an end-side SOTA small language model. Its memory usage is less than 700MB under 4-bit quantization.<\/strong><\/p>\n<p>SOTA stands for state-of-the-art. It does not refer to a specific model, but to the best\/most advanced model currently available in this research task.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18917\" title=\"4924b6b8j00sj0c270023d0018g00sqm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/4924b6b8j00sj0c270023d0018g00sqm.jpg\" alt=\"4924b6b8j00sj0c270023d0018g00sqm\" width=\"1600\" height=\"1034\" \/><\/p>\n<p>Although small in size, the Zamba2-mini 1.2B is comparable to larger models including Google&#039;s Gemma-2B, Huggingface&#039;s SmolLM-1.7B, Apple&#039;s OpenELM-1.1B, and Microsoft&#039;s Phi-1.5.<\/p>\n<p>In the reasoning task, the outstanding performance of Zamba2-mini is particularly remarkable. Compared with models such as Phi3-3.8B, the first token time of Zamba2-mini (the delay from input to output of the first token) is half of the previous one, and the memory usage is reduced by 27%.<\/p>\n<p>Zamba2-mini 1.2B is mainly achieved through a highly optimized architecture that combines the advantages of different neural network designs, which can not only maintain the high-quality output of large and dense transformers, but also run with the computational and memory efficiency of smaller models.<\/p>\n<p>One of the key advances of Zamba2-mini compared to its predecessor Zamba1 is the integration of two shared attention layers.<\/p>\n<p>This two-layer approach enhances the model\u2019s ability to preserve information at different depths, improving overall performance. Including rotational position embeddings in the shared attention layer also slightly improves performance, demonstrating that Zyphra is committed to making incremental yet impactful improvements in model design.<\/p>\n<p>Zamba2-mini is pre-trained on a massive dataset of three trillion tokens from Zyda and other public sources.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18918\" title=\"23591b31j00sj0c27002zd0018g00ogm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/23591b31j00sj0c27002zd0018g00ogm.jpg\" alt=\"23591b31j00sj0c27002zd0018g00ogm\" width=\"1600\" height=\"880\" \/><\/p>\n<p>This massive dataset is rigorously filtered and iterated to ensure the highest quality training data, and is further refined in an annealing phase, which includes training on 100 billion very high quality tokens.<\/p>\n<p>Zyphra has committed to making Zamba2-mini available under the Apache 2.0 license.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>Model.<\/p>\n<p>Attach reference address<\/p>\n<ul>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"fc50\"><a href=\"https:\/\/www.marktechpost.com\/2024\/08\/28\/zyphra-unveils-zamba2-mini-a-state-of-the-art-small-language-model-redefining-on-device-ai-with-unmatched-efficiency-and-performance\/\" target=\"_blank\" rel=\"noopener\">Zyphra Unveils Zamba2-mini: A State-of-the-Art Small Language Model Redefining On-Device AI with Unmatched Efficiency and Performance<\/a><\/p>\n<\/li>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"0b20\"><a href=\"https:\/\/huggingface.co\/Zyphra\/Zamba2-1.2B\" target=\"_blank\" rel=\"noopener\">Model Card for Zamba2-1.2B<\/a><\/p>\n<\/li>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"3225\"><a href=\"https:\/\/www.zyphra.com\/post\/zamba2-mini\" target=\"_blank\" rel=\"noopener\">Zamba2-mini (1.2B)<\/a><\/p>\n<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Zyphra published a blog post on August 27th, announcing the release of Zamba2-mini 1.2B model with 1.2 billion parameters, claiming that it is an end-side SOTA small language model with a memory footprint of less than 700MB at 4bit quantization. SOTA is a term used to refer to the state-of-the-art, which doesn't refer to a specific model, but rather the best\/most advanced model available for this research task. SOTA refers to the state-of-the-art model, not a specific model, but the best\/most advanced model in the research task. Zamba2-mini 1.2B, though small in size, is comparable to the state-of-the-art models of Google Gemma-2B, Huggingface's SmolLM-1.7B, Apple's OpenELM-1.1B, and Microsoft's Phi-1.5B.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[167,219],"collection":[],"class_list":["post-18916","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-219"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18916","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=18916"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18916\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=18916"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=18916"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=18916"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=18916"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}