{"id":33542,"date":"2025-04-19T12:01:17","date_gmt":"2025-04-19T04:01:17","guid":{"rendered":"https:\/\/www.1ai.net\/?p=33542"},"modified":"2025-04-19T12:01:17","modified_gmt":"2025-04-19T04:01:17","slug":"27b-%e6%98%be%e5%ad%98%e9%9c%80%e6%b1%82-54-%e2%86%92-14-1gb%ef%bc%9a%e8%b0%b7%e6%ad%8c%e5%8f%91%e5%b8%83-gemma-3-qat-ai%e6%a8%a1%e5%9e%8b%ef%bc%8crtx-3090-%e6%98%be%e5%8d%a1%e5%8f%af%e8%bf%90","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/33542.html","title":{"rendered":"27B Memory Requirements 54 \u2192 14.1GB: Google Releases Gemma 3 QAT AI Model, Runs on RTX 3090 Graphics Card"},"content":{"rendered":"<p>April 19th.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%b0%b7%e6%ad%8c\" title=\"[View articles tagged with [Google]]\" target=\"_blank\" >Google<\/a>The company published a blog post yesterday, April 18, releasing an optimized version of Quantitative Awareness Training (QAT) <a href=\"https:\/\/www.1ai.net\/en\/tag\/gemma-3\" title=\"_Other Organiser\" target=\"_blank\" >Gemma 3<\/a> Model,<strong>Reduce memory requirements while maintaining high quality.<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-33543\" title=\"015765f5j00suy5sx0025d000sg00i3p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/04\/015765f5j00suy5sx0025d000sg00i3p.jpg\" alt=\"015765f5j00suy5sx0025d000sg00i3p\" width=\"1024\" height=\"651\" \/><\/p>\n<p>Google launched last month <a href=\"https:\/\/www.1ai.net\/en\/tag\/gemma\" title=\"_Other Organiser\" target=\"_blank\" >Gemma<\/a> 3 open-source model that runs efficiently on a single NVIDIA H100 GPU with BFloat16 (BF16) precision.<\/p>\n<p>1AI cites a blog post that describes Google's efforts to make Gemma 3's powerful performance adaptable to common hardware in response to user demand. Quantization techniques are key, dramatically reducing data storage by reducing the numerical precision of model parameters (e.g., from 16 bits in BF16 to 4 bits in int4), similar to image compression that reduces the number of colors.<\/p>\n<p>Quantized by int4, Gemma 3 27B video memory requirements<strong>Sharp reduction from 54GB to 14.1GB<\/strong>The Gemma 3 12B is down from 24GB to 6.6GB; the Gemma 3 1B requires only 0.5GB of video memory.<\/p>\n<p>This means that users can run powerful AI models on desktops (NVIDIA RTX 3090) or laptops (NVIDIA RTX 4060 Laptop GPU), and even phones can support smaller models.<\/p>\n<p>To avoid performance degradation due to quantization, Google employs quantization-aware training (QAT) techniques, which simulate low-precision operations during training to ensure that the model maintains a high level of accuracy even after compression.The Gemma 3 QAT model reduces the perplexity degradation by 541 TP3T in about 5000 training steps.<\/p>\n<p>Major platforms such as Ollama, LM Studio, and llama.cpp have already integrated the model, and users can get the official int4 and Q4_0 models through Hugging Face and Kaggle to easily run on Apple Silicon or CPU. In addition, the Gemmaverse community offers more quantization options to meet different needs.<\/p>","protected":false},"excerpt":{"rendered":"<p>April 19, 2011 - In a blog post yesterday (April 18), Google Inc. released an optimized version of the Gemma 3 model for Quantitative Awareness Training (QAT) that reduces memory requirements while maintaining high quality. Google launched the Gemma 3 open-source model last month, capable of running efficiently on a single NVIDIA H100 GPU with BFloat16 (BF16) precision. Citing a blog post, 1AI said that Google worked to make Gemma 3's powerful performance adaptable to common hardware in response to user demand. Quantization techniques are key, reducing the numerical precision of model parameters (e.g., from 16-bit BF16 to 4-bit int4), similar to image compression that reduces the number of colors, and dramatically reducing the number of data<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[167,1289,5957,281],"collection":[],"class_list":["post-33542","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-gemma","tag-gemma-3","tag-281"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/33542","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=33542"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/33542\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=33542"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=33542"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=33542"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=33542"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}