{"id":21966,"date":"2024-10-25T10:02:15","date_gmt":"2024-10-25T02:02:15","guid":{"rendered":"https:\/\/www.1ai.net\/?p=21966"},"modified":"2024-10-25T10:02:15","modified_gmt":"2024-10-25T02:02:15","slug":"meta-ai%e6%96%b0%e9%87%8f%e5%8c%96%e7%89%88%e6%9c%acllama-3-2%ef%bc%9a%e9%80%9f%e5%ba%a6%e6%8f%90%e9%ab%982%e5%80%8d%e3%80%81%e4%bd%93%e9%87%8f%e5%87%8f%e5%b0%9156%ef%bc%8c%e6%89%8b%e6%9c%ba%e5%b0%b1","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/21966.html","title":{"rendered":"Meta AI's new quantized version, Llama 3.2: 2x faster, 56% less volume, runs on your phone"},"content":{"rendered":"<p>recent,<a href=\"https:\/\/www.1ai.net\/en\/tag\/meta-ai\" title=\"_Other Organiser\" target=\"_blank\" >Meta AI<\/a> Introduced a new quantitative <a href=\"https:\/\/www.1ai.net\/en\/tag\/llama\" title=\"_Other Organiser\" target=\"_blank\" >Llama<\/a>The 3.2 model, available in versions 1B and 3B, is a model that can be fine-tuned, distilled and deployed on a wide range of devices.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-21967\" title=\"be638266j00slw2y5003sd000o400fqm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/10\/be638266j00slw2y5003sd000o400fqm.jpg\" alt=\"be638266j00slw2y5003sd000o400fqm\" width=\"868\" height=\"566\" \/><\/p>\n<p>In the past, while models like Llama3 have achieved remarkable success in natural language understanding and generation, their sheer size and high computational requirements have made them difficult for many organizations to use. Long training times, high energy consumption, and reliance on expensive hardware have certainly increased the chasm between tech giants and smaller organizations.<\/p>\n<p>One of the features of Llama 3.2 is the support for multilingual text and image processing.1B and 3B models are quantized to reduce the size by an average of 561 TP3T and reduce memory usage by 411 TP3T, and achieve 2-3x speedups, making them ideally suited for running on mobile devices and in edge computing environments.<\/p>\n<p>Specifically, these models use 8-bit and 4-bit quantization strategies to reduce the weights and activation precision of the original 32-bit floating-point numbers, thereby dramatically reducing memory requirements and computational power requirements. This means that the quantized Llama3.2 models can run on regular consumer GPUs or even CPUs with little to no loss in performance.<\/p>\n<p>Users can now perform a variety of smart applications on their phones, such as summarizing the content of a discussion in real time or invoking a calendar tool, all thanks to these lightweight models.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-21968\" title=\"887d10dbj00slw2y5002hd000h000fcm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/10\/887d10dbj00slw2y5002hd000h000fcm.jpg\" alt=\"887d10dbj00slw2y5002hd000h000fcm\" width=\"612\" height=\"552\" \/><\/p>\n<p>Meta AI is also working with industry-leading partners such as Qualcomm and MediaTek to deploy these models on a single Arm CPU-based system-on-chip, ensuring that they can be used efficiently across a wide range of devices. Early tests show that Quantized Llama 3.2 achieves 951 TP3T of the Llama 3 model effect in major natural language processing benchmarks, while reducing memory usage by nearly 601 TP3T.This is significant for enterprises and researchers looking to implement AI without investing in costly infrastructure.<\/p>","protected":false},"excerpt":{"rendered":"<p>Recently, Meta AI introduced the new quantitative Llama3.2 model, consisting of versions 1B and 3B, which is a model that can be fine-tuned, distilled, and deployed on a wide range of devices. In the past, while models like Llama3 have achieved significant success in natural language understanding and generation, their large size and high computational requirements have made them difficult for many organizations to use. Long training times, high energy consumption, and reliance on expensive hardware have certainly increased the gap between tech giants and smaller organizations. One of the features of Llama 3.2 is the support for multilingual text and image processing.1B and 3B models are quantized to reduce the size by an average of 561 TP3T, and reduce memory usage by 411 TP3T, and achieve a 2-3x speedup, the<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[184,547],"collection":[],"class_list":["post-21966","post","type-post","status-publish","format-standard","hentry","category-news","tag-llama","tag-meta-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/21966","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=21966"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/21966\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=21966"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=21966"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=21966"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=21966"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}