{"id":25129,"date":"2024-12-14T23:30:58","date_gmt":"2024-12-14T15:30:58","guid":{"rendered":"https:\/\/www.1ai.net\/?p=25129"},"modified":"2024-12-14T23:30:58","modified_gmt":"2024-12-14T15:30:58","slug":"deepseek-vl2-ai-%e8%a7%86%e8%a7%89%e6%a8%a1%e5%9e%8b%e5%bc%80%e6%ba%90%ef%bc%9a%e6%94%af%e6%8c%81%e5%8a%a8%e6%80%81%e5%88%86%e8%be%a8%e7%8e%87%e3%80%81%e5%a4%84%e7%90%86%e7%a7%91%e7%a0%94%e5%9b%be","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/25129.html","title":{"rendered":"DeepSeek-VL2 AI visual model open source: support for dynamic resolution, processing scientific research charts, parsing various terrain maps, etc."},"content":{"rendered":"<p>DeepSeek's official public website published a blog post yesterday (December 13) announcing that<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a> The DeepSeek-VL2 model, which achieved highly favorable results in all evaluation metrics, is officially known as the<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%a7%86%e8%a7%89%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [visual model] labels]\" target=\"_blank\" >Visual Model<\/a>Formally entered the era of Mixture of Experts (MoE).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-25130\" title=\"c6cf9969j00sohppu004od000u000e6p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/c6cf9969j00sohppu004od000u000e6p.jpg\" alt=\"c6cf9969j00sohppu004od000u000e6p\" width=\"1080\" height=\"510\" \/><\/p>\n<p>Citing the official press release, 1AI attached the DeepSeek-VL2 highlights as follows:<\/p>\n<ul>\n<li>Data: double the quality of training data than the first generation of DeepSeek-VL, introducing new capabilities such as terse map understanding, visual localization, visual story generation, etc.<\/li>\n<li>Architecture: the visual part uses a cut-over strategy to support dynamic resolution images, and the linguistic part adopts the MoE architecture for low-cost and high performance.<\/li>\n<li>Training: Inheriting the three-phase training process of DeepSeek-VL, while adapting to the difficulty of variable number of image slices through load balancing, using different streaming parallelism strategies for image and text data, and introducing expert parallelism for the MoE language model to realize efficient training.<\/li>\n<\/ul>\n<p>The DeepSeek-VL2 model supports dynamic resolution, using only one SigLIP-SO400M as an image encoder to achieve dynamic resolution image support by cutting images into multiple sub-graphs and a global thumbnail. This strategy allows DeepSeek-VL2 to support up to 1152 x 1152 resolution and an extreme width ratio of 1:9 or 9:1 with more applications\u3002<\/p>\n<p>The DeepSeek-VL2 model also benefits from the learning of more scientific document data, allowing it to easily understand various scientific charts and generate Python code based on the images through Plot2Code.<\/p>\n<p>Both the model and the paper have been published:<\/p>\n<p data-vmark=\"7a2a\"><strong>Model Download:<\/strong><a title=\"https:\/\/huggingface.co\/deepseek-ai\" href=\"https:\/\/huggingface.co\/deepseek-ai\" target=\"_blank\" rel=\"noopener\"><span class=\"link-text-start-with-http\">https:\/\/huggingface.co\/deepseek-ai<\/span><\/a><\/p>\n<p data-vmark=\"b448\"><strong>GitHub homepage:<\/strong><a title=\"https:\/\/github.com\/ deepseek-ai\/DeepSeek-VL2\" href=\"https:\/\/github.com\/%20deepseek-ai\/DeepSeek-VL2\" target=\"_blank\" rel=\"noopener\"><span class=\"link-text-start-with-http\">https:\/\/github.com\/<\/span>\u00a0deepseek-ai\/DeepSeek-VL2<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>The DeepSeek Official Public Day, yesterday, 13 December, published an article announcing the open-source DeepSeek-VL2 model, which has achieved excellent results in terms of various assessment indicators, officially claiming that its visual model officially entered the Mixture of Experts (MoE). 1AI quotes official press releases with the following highlights: Data: double the number of high-quality training data compared to the generation of DeepSeek-VL, introduction of new capabilities such as engraving understanding, visual positioning, visual story production, etc. Architecture: visual part supports dynamic resolution images with cut-by-schematic strategies, language part with MoE structure low-cost high performance<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[219,1865],"collection":[],"class_list":["post-25129","post","type-post","status-publish","format-standard","hentry","category-news","tag-219","tag-1865"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/25129","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=25129"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/25129\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=25129"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=25129"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=25129"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=25129"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}