{"id":25646,"date":"2024-12-25T17:58:30","date_gmt":"2024-12-25T09:58:30","guid":{"rendered":"https:\/\/www.1ai.net\/?p=25646"},"modified":"2024-12-25T17:58:30","modified_gmt":"2024-12-25T09:58:30","slug":"%e9%98%bf%e9%87%8c%e9%80%9a%e4%b9%89%e5%8d%83%e9%97%ae%e5%bc%80%e6%ba%90%e8%a7%86%e8%a7%89%e6%8e%a8%e7%90%86%e6%a8%a1%e5%9e%8b-qvq-72b-preview%ef%bc%9a%e5%83%8f%e7%89%a9%e7%90%86%e5%ad%a6%e5%ae%b6","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/25646.html","title":{"rendered":"Ali Tongyi Thousand Questions Open Source Visual Reasoning Model QVQ-72B-Preview: Think Like a Physicist"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c\" title=\"[View articles tagged with [Ali]]\" target=\"_blank\" >Ali<\/a><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%80%9a%e4%b9%89%e5%8d%83%e9%97%ae\" title=\"[View articles tagged with [Tongyi Thousand Questions]]\" target=\"_blank\" >Thousand Questions on Tongyi<\/a> The Qwen team published a blog post today (December 25) announcing the launch of QVQ-72B-Preview, based on the Qwen2-VL-72B build <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%a7%86%e8%a7%89%e6%8e%a8%e7%90%86%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with visual reasoning labels]\" target=\"_blank\" >visual inference model<\/a>,<strong>Be able to find solutions to complex physics problems through logical reasoning in a calm and collected manner, just like the masters of physics.<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-25647\" title=\"fc13039aj00sp1noc0030d000v900c5p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/fc13039aj00sp1noc0030d000v900c5p.jpg\" alt=\"fc13039aj00sp1noc0030d000v900c5p\" width=\"1125\" height=\"437\" \/><\/p>\n<p>Ali Tongyi Thousand Questions team evaluates QVQ-72B-Preview on 4 datasets, 1AI attached the relevant introduction below:<\/p>\n<ul>\n<li>MMMU: A university-level, multidisciplinary, multimodal assessment set designed to examine integrated understanding and reasoning skills related to model vision.<\/li>\n<li>MathVista: a collection of math-related visual reasoning tests that assesses the ability to reason logically with puzzle test graphs, algebraically with function graphs, and scientifically with academic paper graphs.<\/li>\n<li>MathVision: a collection of high-quality multimodal mathematical reasoning tests from real math competitions, with more question diversity and subject breadth than MathVista.<\/li>\n<li>OlympiadBench: an Olympiad-level bilingual multimodal science benchmark test set containing 8,476 problems from the Olympiad math and physics competitions, including the Chinese Gaokao. Each problem is accompanied by expert-level annotations detailing step-by-step reasoning.<\/li>\n<\/ul>\n<p>Test results show that QVQ-72B-Preview achieved a score of 70.3 on the MMMU benchmark, significantly outperforming Qwen2-VL-72B-Instruct. additionally, the model performed well in the three remaining benchmarks focused on math and science problems, effectively closing the gap with the leading state-of-the-art o1 model.<\/p>\n<p>Ali Tongyi Thousand Questions Qwen team also stated that QVQ-72B-Preview is an experimental research model focused on enhancing visual reasoning. Although it performed beyond expectations, there are still several limitations to be aware of:<\/p>\n<ul>\n<li>Language mixing and switching: The model may accidentally mix languages or switch between languages, thus affecting the clarity of the response.<\/li>\n<li>Recursive reasoning: the model may fall into a circular logic pattern, generating lengthy responses without reaching a conclusion.<\/li>\n<li>Security and Ethical Considerations: Models require enhanced security measures to ensure reliable and safe performance, and users should exercise caution when deploying them.<\/li>\n<li>Performance and Benchmark Limitations: Although the model has improved in visual reasoning, it cannot fully replace the capabilities of the Qwen2-VL-72B. In addition, during multi-step visual reasoning, the model may gradually lose focus on the image content, leading to hallucinations.<\/li>\n<\/ul>\n<p data-vmark=\"092b\"><span class=\"referenceTitle\">refer to<\/span><\/p>\n<ul class=\"custom_reference list-paddingleft-1\">\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"dfaa\"><a href=\"https:\/\/modelscope.cn\/models\/Qwen\/QVQ-72B-Preview\" target=\"_blank\" rel=\"noopener\">Model Links<\/a><\/p>\n<\/li>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"a75f\"><a href=\"https:\/\/modelscope.cn\/studios\/Qwen\/QVQ-72B-preview\" target=\"_blank\" rel=\"noopener\">Experience Links<\/a><\/p>\n<\/li>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"f822\"><a href=\"https:\/\/qwenlm.github.io\/zh\/blog\/qvq-72b-preview\" target=\"_blank\" rel=\"noopener\">Chinese Blog<\/a><\/p>\n<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Ali Tongyi Qwen team released a blog post today (December 25th), announcing the launch of QVQ-72B-Preview, an open source visual reasoning model based on the Qwen2-VL-72B build, which is capable of finding solutions through logical reasoning calmly in the face of complex physics problems like a master of physics. The Ali Tongyi Thousand Questions team evaluated QVQ-72B-Preview on 4 datasets. 1AI attached the relevant introduction as follows: MMMU: a university-level multidisciplinary multimodal assessment set designed to examine the model's visually relevant comprehensive understanding and reasoning ability. MathVista: a math-related visual reasoning test set assessing logical reasoning for puzzle test graphs, algebraic reasoning for function graphs<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[219,5299,331,1759],"collection":[],"class_list":["post-25646","post","type-post","status-publish","format-standard","hentry","category-news","tag-219","tag-5299","tag-331","tag-1759"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/25646","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=25646"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/25646\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=25646"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=25646"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=25646"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=25646"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}