{"id":23946,"date":"2024-11-27T20:38:24","date_gmt":"2024-11-27T12:38:24","guid":{"rendered":"https:\/\/www.1ai.net\/?p=23946"},"modified":"2024-11-27T20:38:24","modified_gmt":"2024-11-27T12:38:24","slug":"hugging-face-%e5%8f%91%e5%b8%83-smolvlm-%e5%bc%80%e6%ba%90-ai%e6%a8%a1%e5%9e%8b%ef%bc%9a20-%e4%ba%bf%e5%8f%82%e6%95%b0%ef%bc%8c%e7%94%a8%e4%ba%8e%e7%ab%af%e4%be%a7%e6%8e%a8%e7%90%86%ef%bc%8c%e4%bd%93","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/23946.html","title":{"rendered":"Hugging Face Releases SmolVLM Open Source AI Model: 2 Billion Parameters for End-Side Reasoning, Small and Fast"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/hugging-face\" title=\"[See articles with [Hugging Face] label]\" target=\"_blank\" >Hugging Face<\/a> The platform published a blog post yesterday (November 26) announcing the launch of the SmolVLM AI <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%a7%86%e8%a7%89%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [visual language modeling]]\" target=\"_blank\" >visual language model<\/a>(VLM).<strong>With only 2 billion parameters for device-side reasoning, it stands out among similar models by virtue of its extremely low memory footprint.<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-23947\" title=\"0a633656j00snm0ee005jd000sg00i8p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/11\/0a633656j00snm0ee005jd000sg00i8p.jpg\" alt=\"0a633656j00snm0ee005jd000sg00i8p\" width=\"1024\" height=\"656\" \/><\/p>\n<p>Officially, the SmolVLM AI model has the advantage of being small, fast, memory efficient, and completely<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>All model checkpoints, VLM datasets, training recipes, and tools are released under the Apache 2.0 license.<\/p>\n<p>There are three versions of the SmolVLM AI model, SmolVLM-Base (for downstream fine-tuning), SmolVLM-Synthetic (synthetic data-based fine-tuning), and SmolVLM-Instruct (command fine-tuning version that can be used directly in interactive applications).<\/p>\n<p><strong>build<\/strong><\/p>\n<p>The most important feature of SmolVLM is the clever architectural design, which borrows from Idefics3 and uses SmolLM2 1.7B as the language backbone to increase the compression rate of visual information up to 9 times by pixel blending strategy.<\/p>\n<p>The training datasets include Cauldron and Docmatix, with contextual extensions to SmolLM2 that enable it to handle longer text sequences and multiple images. The model effectively reduces the memory footprint by optimizing the image encoding and inference process, solving the previous problem of large-scale models running slowly or even crashing on common devices.<\/p>\n<p><strong>Memory<\/strong><\/p>\n<p>SmolVLM encodes 384 x 384 pixels to 81 tokens, so SmolVLM uses 1,200 tokens in the same test picture, while Qwen2-VL uses 16,000 tokens\u3002<\/p>\n<p><strong>throughput<\/strong><\/p>\n<p>SmolVLM performs well in multiple benchmarks such as MMMU, MathVista, MMStar, DocVQA, and TextVQA, and is 3.3 to 4.5 times faster in pre-fill throughput and 7.5 to 16 times faster in generation throughput compared to Qwen2-VL.<\/p>","protected":false},"excerpt":{"rendered":"<p>The Hugging Face platform published a blog post yesterday (November 26) announcing the launch of the SmolVLM AI visual language model (VLM), with just 2 billion parameters for device-side reasoning, which stands out from its peers by virtue of its extremely low memory footprint. Officially, the SmolVLM AI model benefits from being small, fast, memory-efficient, and completely open source, with all model checkpoints, VLM datasets, training recipes, and tools released under the Apache 2.0 license. SmolVLM AI models are available in SmolVLM-Base (for downstream fine-tuning), SmolVLM-Synthetic (for fine-tuning based on synthetic data) and SmolVLM-Synthetic (for fine-tuning based on synthetic data).<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[384,219,4981],"collection":[],"class_list":["post-23946","post","type-post","status-publish","format-standard","hentry","category-news","tag-hugging-face","tag-219","tag-4981"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/23946","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=23946"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/23946\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=23946"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=23946"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=23946"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=23946"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}