Shanghai Artificial Intelligence Laboratory open-sources multimodal macromodel "Shusheng Wanxiang 3.0": able to process text and multimodal inputs simultaneously

Shanghai Artificial Intelligence Laboratory open-sources multimodal large model "Shusheng Wanxiang 3.0": able to process text and multimodal inputs simultaneously

according toShanghai Artificial Intelligence LaboratoryOfficial public number, April 16, Shanghai Artificial Intelligence Laboratory (Shanghai AI Lab) upgraded andOpen Sourcegeneral purposeMultimodal large modelShusen Wanxiang 3.0 (InternVL3).

Shanghai Artificial Intelligence Laboratory open-sources multimodal large model "Shusheng Wanxiang 3.0": able to process text and multimodal inputs simultaneously

Officially, through the use of innovative multimodal pre-training and post-training methods, InternVL3 multimodal basic capabilities have been comprehensively improved, and in the expert-level benchmark tests and comprehensive multimodal performance tests, the full-scale version of the 1 billion to 78 billion parameters ranked first in the performance of open source models, and at the same time, the capabilities of the graphical user interface (GUI) intelligences, the comprehension of architectural scene drawings, the spatial perceptual reasoning, and the reasoning of liberal arts disciplines have been significantly improved. perceptual reasoning, and generalized disciplinary reasoning.

According to the report, the team proposed aAn Innovative Native Multimodal Pretraining Approach, unlike the traditional approach of optimizing a large language model before adding visual capabilities, this approach seamlessly combines textual data with multimodal data in the pre-training phase of the model, allowing the model to beLearning language and vision at the same timeThis allows for simultaneous processing of text and multimodal inputs.

In addition to handling generalized multimodal tasks, InternVL3 extends multimodal capabilities in a variety of ways, such asGraphical User Interface (GUI) Intelligentsia, Architectural Scene Drawing Understanding, Spatial Perceptual Reasoning, Generalist Discipline Reasoningwait.

According to the introduction, InternVL3 can be used as a GUI intelligence to follow the instructions toOperate specialized software on your computer or cell phone.

1AI summarizes the relevant links below:

Link to technical report: https://huggingface.co/ papers / 2504.10479
Code open source / Model usage: https://github.com/ OpenGVLab / InternVL
Model address: https://huggingface.co/ OpenGVLab / InternVL3-78B
Public Beta: https://chat.intern-ai.org.cn/

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Shanghai Artificial Intelligence Laboratory open-sources multimodal large model "Shusheng Wanxiang 3.0": able to process text and multimodal inputs simultaneously

ByteDance Releases Beanbag 1.5 Deep Thinking Model with "Thinking in Pictures" Capability

Industry's first, Ali Tongyi Wanphase "first and last frame born video model" open source

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

ByteDance Releases Beanbag 1.5 Deep Thinking Model with "Thinking in Pictures" Capability

Industry's first, Ali Tongyi Wanphase "first and last frame born video model" open source

Huazhong University of Science and Technology open-sources multimodal large model Monkey

Small parameters, strong performance! Open source multimodal model - TinyGPT-V

Zhipu open-sources the next-generation multimodal large model CogVLM2

Pixtral 12B Released: Mistral Open Sources First Multimodal AI Big Model

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow