{"id":36548,"date":"2025-05-31T11:36:16","date_gmt":"2025-05-31T03:36:16","guid":{"rendered":"https:\/\/www.1ai.net\/?p=36548"},"modified":"2025-05-31T11:36:16","modified_gmt":"2025-05-31T03:36:16","slug":"%e5%b0%8f%e7%b1%b3%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b-mimo-vl-%e5%bc%80%e6%ba%90%ef%bc%8c%e5%ae%98%e6%96%b9%e7%a7%b0%e5%a4%9a%e6%96%b9%e9%9d%a2%e9%a2%86%e5%85%88-qwen2-5-vl-7b","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/36548.html","title":{"rendered":"Xiaomi's multimodal large model MiMo-VL open source, officially said to be leading in many aspects Qwen2.5-VL-7B"},"content":{"rendered":"<p>Xiaomi MiMo's official public post on May 30 announced that the<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%b0%8f%e7%b1%b3\" title=\"[View articles tagged with [Xiaomi]]\" target=\"_blank\" >Millet<\/a><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [Multimodal Large Model] labels]\" target=\"_blank\" >Multimodal large model<\/a> Xiaomi (brand) <a href=\"https:\/\/www.1ai.net\/en\/tag\/mimo-vl\" title=\"_Other Organiser\" target=\"_blank\" >MiMo-VL<\/a> Now officially<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>. Officially, it is dramatically ahead of Qwen2.5-VL-7B, the benchmark multimodal model of the same size, in multiple tasks such as generalized Q&amp;A and comprehensible reasoning for images, videos, and languages, and it compares favorably with dedicated models in GUI Grounding tasks for the\u00a0<strong>Ag<\/strong><strong>e<\/strong><strong>nt Times<\/strong><strong>and<\/strong><strong>Come on.<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-36549\" title=\"d415b6d6j00sx3wmq00cvd000u000p4p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/05\/d415b6d6j00sx3wmq00cvd000u000p4p.jpg\" alt=\"d415b6d6j00sx3wmq00cvd000u000p4p\" width=\"1080\" height=\"904\" \/><\/p>\n<p>MiMo-VL-7B maintains the text-only reasoning capability of MiMo-7B while dramatically outperforming the 10x parameter-sized Ari Qwen-2.5-VL-72B and the Ari QVQ-72B on multimodal inference tasks using only 7B parameter size in the Olympiad (OlympiadBench) and several math competitions (MathVision, MathVerse). QVQ-72B-Preview.<strong>Also beyond closed source models GPT-4o<\/strong>.<\/p>\n<p>In the internal grand modeling arena of evaluating real user experiences, the<strong>MiMo-VL-7B Surpasses GPT-4o as #1 Open Source Model<\/strong>.<\/p>\n<p>Its ability to perform tasks such as complex image reasoning and Q&amp;A, the MiMo-VL-7B also shows good potential in GUI operations up to 10+ steps, and can even help you add the Xiaomi SU7 to your wishlist.<\/p>\n<p>It uses high-quality pre-training data as well as innovative<strong>Hybrid Online Reinforcement Learning Algorithms<\/strong>(Mixed On-policy Reinforcement Learning, MORL):<\/p>\n<ul>\n<li><strong>Multi-stage pre-training:<\/strong><\/li>\n<li>We collect, clean and synthesize high quality pre-trained multimodal data, covering image-text pairs, video-text pairs, GUI operation sequences and other data types, totaling 2.4T tokens, and strengthen the ability of long-range multimodal inference by adjusting the proportion of different types of data in stages.<\/li>\n<li><strong>Blended online intensive learning:<\/strong><\/li>\n<li>Mixed text inference, multimodal perception + inference, RLHF and other feedback signals, and through online reinforcement learning algorithms to stabilize and accelerate the training, all-round enhancement of model inference, perception performance and user experience.<\/li>\n<\/ul>\n<p>MiMo-VL-7B has been open source RL before and after the two models, IT home with open source links: https:\/\/huggingface.co\/XiaomiMiMo and related technical reports: https:\/\/github.com\/XiaomiMiMo\/MiMo-VL\/blob\/main\/MiMo-VL- Technical-Report.pdf<\/p>\n<p>The MiMo-VL-7B framework for supporting 50+ quiz tasks has also been open-sourced to GitHub: https:\/\/github.com\/XiaomiMiMo\/lmms-eval<\/p>","protected":false},"excerpt":{"rendered":"<p>Xiaomi MiMo's official public number announced on May 30 that Xiaomi MiMo-VL, Xiaomi's large multimodal model, is now officially open source. Officially, it is significantly ahead of the benchmark multimodal model Qwen2.5-VL-7B of the same size in a number of tasks such as generalized Q&amp;A and comprehension reasoning in pictures, videos, languages, and is comparable to the dedicated model in the GUI Grounding task for the Agent era. MiMo-VL-7B maintains the text-only reasoning capability of MiMo-7B, and is able to perform multimodal reasoning tasks in OlympiadBench and several MathVision competitions with only 7B parameter size.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[6802,602,1114,219],"collection":[],"class_list":["post-36548","post","type-post","status-publish","format-standard","hentry","category-news","tag-mimo-vl","tag-602","tag-1114","tag-219"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/36548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=36548"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/36548\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=36548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=36548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=36548"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=36548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}