{"id":16959,"date":"2024-08-01T09:56:33","date_gmt":"2024-08-01T01:56:33","guid":{"rendered":"https:\/\/www.1ai.net\/?p=16959"},"modified":"2024-08-02T09:59:57","modified_gmt":"2024-08-02T01:59:57","slug":"zyphra%e6%8e%a8%e5%b0%8f%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8bzamba2-2-7b%ef%bc%9a-%e9%80%9f%e5%ba%a6%e6%8f%90%e9%ab%98%e4%b8%80%e5%80%8d%ef%bc%8c%e5%86%85%e5%ad%98%e6%88%90%e6%9c%ac%e9%99%8d%e4%bd%8e27","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/16959.html","title":{"rendered":"Zyphra launches small language model Zamba2-2.7B: speed doubled, memory cost reduced by 27%"},"content":{"rendered":"<p data-pm-slice=\"0 0 []\">recent,<a href=\"https:\/\/www.1ai.net\/en\/tag\/zyphra\" title=\"_Other Organiser\" target=\"_blank\" >Zyphra<\/a> The company released a new Zamba2-2.7B language model.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%b0%8f%e5%9e%8b%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [small language model] labels]\" target=\"_blank\" >Small Language Model<\/a>The new model has achieved significant improvements in performance and efficiency, with a training dataset of approximately 3 trillion tokens, making it comparable in performance to Zamba1-7B and other leading 7B models.<\/p>\n<p data-track=\"127\">The most surprising thing is that Zamba2-2.7B significantly reduces its resource requirements during inference, making it an efficient solution for mobile device applications.<\/p>\n<p data-track=\"128\">Zamba2-2.7B achieves a two-fold improvement in the key metric of \u201ctime to first response,\u201d meaning it can generate initial responses faster than its competitors. This is critical for applications that require real-time interactions, such as virtual assistants and chatbots.<\/p>\n<p data-track=\"129\">In addition to the speed improvement, Zamba2-2.7B also does an excellent job in memory usage.<strong>It reduces the memory overhead of 27%, making it ideal for deployment on devices with limited memory resources.<\/strong>Such intelligent memory management ensures that the model can run effectively even in environments with limited computing resources, expanding its application range on various devices and platforms.<\/p>\n<p data-track=\"130\">Another significant advantage of Zamba2-2.7B is that it has lower build latency.<strong> Compared with Phi3-3.8B, its latency is reduced by 1.29 times.<\/strong>This makes the interaction smoother. Low latency is particularly important in applications that require seamless and continuous communication, such as customer service robots and interactive educational tools. Therefore, Zamba2-2.7B is undoubtedly the first choice for developers in improving user experience.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16960\" title=\"get-13\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/get-13.jpg\" alt=\"get-13\" width=\"758\" height=\"492\" \/><\/div>\n<p data-track=\"131\">Zamba2-2.7B consistently outperforms other similar-sized models in benchmark comparisons. Its superior performance is a testament to Zyphra\u2019s innovation and efforts in advancing AI technology. This model uses an improved interleaved shared attention mechanism and is equipped with LoRA projectors on a shared MLP module, ensuring high performance output when handling complex tasks.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16961\" title=\"get-14\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/get-14.jpg\" alt=\"get-14\" width=\"674\" height=\"402\" \/><\/div>\n<p data-track=\"132\">Model entry: https:\/\/huggingface.co\/Zyphra\/Zamba2-2.7B<\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>The recent release of the new Zamba2-2.7B language model from Zyphra is significant in the history of small language modeling. The new model achieves significant improvements in performance and efficiency with a training dataset of approximately 3 trillion tokens, making it comparable in performance to Zamba1-7B and other leading 7B models. And most surprisingly, Zamba2-2.7B has significantly lower resource requirements for inference, making it an efficient solution for mobile device applications. Zamba2-2.7B achieves a two-fold improvement in the key metric of \"Time to First Response Generation\", which means that it can generate an initial response much faster than its competitors. This is a great solution for virtual assistants, chat<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3824,634],"collection":[],"class_list":["post-16959","post","type-post","status-publish","format-standard","hentry","category-news","tag-zyphra","tag-634"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/16959","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=16959"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/16959\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=16959"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=16959"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=16959"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=16959"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}