{"id":33453,"date":"2025-04-18T10:55:11","date_gmt":"2025-04-18T02:55:11","guid":{"rendered":"https:\/\/www.1ai.net\/?p=33453"},"modified":"2025-04-18T10:55:11","modified_gmt":"2025-04-18T02:55:11","slug":"%e4%b8%8a%e6%b5%b7%e4%ba%ba%e5%b7%a5%e6%99%ba%e8%83%bd%e5%ae%9e%e9%aa%8c%e5%ae%a4%e5%bc%80%e6%ba%90%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b%e4%b9%a6%e7%94%9f%e3%83%bb%e4%b8%87","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/33453.html","title":{"rendered":"Shanghai Artificial Intelligence Laboratory open-sources multimodal large model \"Shusheng Wanxiang 3.0\": able to process text and multimodal inputs simultaneously"},"content":{"rendered":"<p>according to<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e4%b8%8a%e6%b5%b7%e4%ba%ba%e5%b7%a5%e6%99%ba%e8%83%bd%e5%ae%9e%e9%aa%8c%e5%ae%a4\" title=\"Look at the article that contains the label\" target=\"_blank\" >Shanghai Artificial Intelligence Laboratory<\/a>Official public number, April 16, Shanghai Artificial Intelligence Laboratory (Shanghai AI Lab) upgraded and<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>general purpose<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [Multimodal Large Model] labels]\" target=\"_blank\" >Multimodal large model<\/a>Shusen Wanxiang 3.0 (InternVL3).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-33454\" title=\"b1a58317j00suw82r00c0d000u000hjp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/04\/b1a58317j00suw82r00c0d000u000hjp.jpg\" alt=\"b1a58317j00suw82r00c0d000u000hjp\" width=\"1080\" height=\"631\" \/><\/p>\n<p>Officially, through the use of innovative multimodal pre-training and post-training methods, InternVL3 multimodal basic capabilities have been comprehensively improved, and in the expert-level benchmark tests and comprehensive multimodal performance tests, the full-scale version of the 1 billion to 78 billion parameters ranked first in the performance of open source models, and at the same time, the capabilities of the graphical user interface (GUI) intelligences, the comprehension of architectural scene drawings, the spatial perceptual reasoning, and the reasoning of liberal arts disciplines have been significantly improved. perceptual reasoning, and generalized disciplinary reasoning.<\/p>\n<p>According to the report, the team proposed a<strong>An Innovative Native Multimodal Pretraining Approach<\/strong>, unlike the traditional approach of optimizing a large language model before adding visual capabilities, this approach seamlessly combines textual data with multimodal data in the pre-training phase of the model, allowing the model to be<strong>Learning language and vision at the same time<\/strong>This allows for simultaneous processing of text and multimodal inputs.<\/p>\n<p>In addition to handling generalized multimodal tasks, InternVL3 extends multimodal capabilities in a variety of ways, such as<strong>Graphical User Interface (GUI) Intelligentsia, Architectural Scene Drawing Understanding, Spatial Perceptual Reasoning, Generalist Discipline Reasoning<\/strong>wait.<\/p>\n<p>According to the introduction, InternVL3 can be used as a GUI intelligence to follow the instructions to<strong>Operate specialized software on your computer or cell phone<\/strong>.<\/p>\n<p>1AI summarizes the relevant links below:<\/p>\n<ul>\n<li>Link to technical report: https:\/\/huggingface.co\/ papers \/ 2504.10479<\/li>\n<li>Code open source \/ Model usage: https:\/\/github.com\/ OpenGVLab \/ InternVL<\/li>\n<li>Model address: https:\/\/huggingface.co\/ OpenGVLab \/ InternVL3-78B<\/li>\n<li>Public Beta: https:\/\/chat.intern-ai.org.cn\/<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>According to the official public number of the Shanghai Artificial Intelligence Laboratory, on April 16, the Shanghai Artificial Intelligence Laboratory (Shanghai AI Lab) upgraded and open-sourced the general multimodal large model Shusheng Wanxiang 3.0 (InternVL3). The official introduction, through the use of innovative multimodal pre-training and post-training methods, InternVL3 multimodal basic capabilities comprehensively improved, in the expert benchmark test, multimodal performance of the full-scale version of the parameters of 1 billion ~ 78 billion parameters in the open source model in the performance of the first, and at the same time significantly improve the graphical user interface (GUI) intelligences, architectural scene drawing understanding, spatial perceptual reasoning, as well as liberal arts disciplines, and the development of a new model. perceptual reasoning, and generalized disciplinary reasoning. It is reported that the team proposes an innovative native multimodal pre-training method with the<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1463,602,219],"collection":[],"class_list":["post-33453","post","type-post","status-publish","format-standard","hentry","category-news","tag-1463","tag-602","tag-219"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/33453","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=33453"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/33453\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=33453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=33453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=33453"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=33453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}