Skywork-R1V 3.0 Released and Open-Sourced by Kunlun World Wide Web, Multimodal Reasoning Capability Approaches Human Expert Levels

July 9 News.Kunlun WanweiJust released an announcement announcing the latest Skywork-R1V 3.0 release andOpen Source.

Skywork-R1V 3.0 Released and Open-Sourced by Kunlun World Wide Web, Multimodal Reasoning Capability Approaches Human Expert Levels

According to KunlunWanwei, Skywork-R1V 3.0 deeply stimulates the cross-modal reasoning ability of the model in the post-training phase through the reinforcement learning strategy, and achieves a double leap in complex logic modeling and cross-disciplinary generalization.

Skywork-R1V 3.0 is based on the previous generation of inference model Skywork-R1V 2.0 to distill the data for a "cold start", and construct a high-quality multimodal inference training set through rejection sampling, so as to instruct the open-source visual macromodel InternVL-38B (38B parameter) to learn the basic format of multimodal inference. The training set was constructed by rejection sampling, and the open source visual macromodel InternVL-38B (38B parameters) was instructed to learn the basic format and method of multimodal inference.

Subsequently, the reinforcement learning algorithm GRPO (Group Relative Policy Optimization) was introduced to deeply stimulate the inference potential of the model, successfully realizing the migration of inference capability between image and text modalities, and significantly improving its understanding and analysis performance in cross-modal and multi-disciplinary scenarios.

According to the introduction,Skywork R1V 3.0 Relying on only about 12,000 supervised fine-tuning samples and 13,000 reinforcement learning samples, efficient training is achieved, which fully reflects the advantage of "small data inspires large capacity".

In terms of performance, the model achieves the highest score of 76.0 for open-source models in the authoritative and comprehensive multimodal review MMMU, surpassing closed-source models such as Claude-3.7-Sonnet (75.0) and GPT-4.5 (74.4), and approaching the level of human primary experts (76.2).

Kunlun says that R1V 3.0's outstanding performance in high school math is close to a number of top closed-source models and achieves the optimal results for open-source multimodal reasoning models, proving its excellent real-world problem-solving performance and stability in cross-scene generalization.

In a more testing test of visual reasoning EMMA-Mini (CoT) On top of that, with the open source leading 40.3 It outperforms larger models such as Qwen2.5-VL-72B-Instruct and InternVL3-78B, and closes the gap with the closed-source model Claude-3.7-Sonnet.

In covering primary and secondary school knowledge points MMK12 On, R1V 3.0 to 78.5 The score again leads the open-source camp, surpassing open-source models such as Qwen2.5-VL-72B-Instruct and InternVL3-78B, as well as closed-source models such as GPT-4.5 and GPT-4o.

Compared with the previous generation model, Skywork-R1V 3.0 has achieved significant performance improvements in several key areas, including physics and logic, and has become one of the most powerful multimodal inference models in the open source space:

  • Physical reasoning: Authoritative reviews in the field of physics PhyX-MC-Text-Minimal and SeePhys Skywork-R1V 3.0 achieved the following results respectively 52.8 score 31.5 pointopen source best performanceThe model has fully demonstrated its excellent ability in multimodal physics reasoning. The model is not only able to accurately understand basic physics concepts such as mechanics and electromagnetism, but also good at dealing with complex physics problems combining graphics and text (e.g., analyzing professional diagrams such as force analysis diagrams and circuit schematic diagrams), and its level of physical reasoning has significantly exceeded that of the current mainstream open-source models, as well as some of the closed-source models such as GPT-4.5 and Gemini 2 Flash.
  • Logical Reasoning:Skywork-R1V 3.0 also excels in a number of authoritative logical reasoning tests: in the LogicVista Achieved in the test 59.7 points in VisuLogic Achieved in the test 28.5 Points. In the MME-Reasoning Skywork-R1V 3.0 has been recognized as one of the most popular products in the world. 42.8 The score surpasses the closed-source model Claude-4-Sonnet, which demonstrates Skywork-R1V 3.0's leading capabilities in multimodal logic consistency, conditional reasoning, and cross-modal causal modeling.
  • Mathematical reasoning: R1V 3.0 demonstrated excellent problem solving skills on math problems. On the leading math benchmarks MathVista, MathVerse, and MathVision, R1V 3.0 scored 77.1, 59.6, and 52.6, respectively, ahead of open-source models such as Qwen2.5-VL-72B-Instruct, InternVL3-78B, QVQ-72B-Preview, and others. Preview and other open source models.

Skywork-R1V 3.0 download:

  • HuggingFace at https://huggingface.co/ Skywork / Skywork-R1V3-38B
  • GitHub address: https://github.com/SkyworkAI/Skywork-R1V
  • Technical report: https://github.com/SkyworkAI/Skywork-R1V/blob/main/Skywork_R1V3.pdf
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Apple mulls upgrading Apple Support app: introducing an AI assistant to improve customer service experience

2025-7-9 11:37:55

Information

Midea vice president Wang Jianguo: the next three years plan to invest more than 50 billion yuan in the layout of the AI big model

2025-7-9 11:40:32

Search