{"id":3592,"date":"2024-02-03T09:09:06","date_gmt":"2024-02-03T01:09:06","guid":{"rendered":"https:\/\/www.1ai.net\/?p=3592"},"modified":"2024-02-03T09:09:06","modified_gmt":"2024-02-03T01:09:06","slug":"%e5%8d%8e%e7%a7%91%e5%a4%a7%e5%8f%91%e5%b8%83%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b%e6%96%b0%e5%9f%ba%e5%87%86-%e8%a6%86%e7%9b%96%e4%ba%94%e5%a4%a7%e4%bb%bb%e5%8a%a1","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/3592.html","title":{"rendered":"HUST releases new benchmark for multimodal large models covering five major tasks"},"content":{"rendered":"<p>Recently,<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%8d%8e%e4%b8%ad%e7%a7%91%e6%8a%80%e5%a4%a7%e5%ad%a6\" title=\"[Sees articles with tags]\" target=\"_blank\" >Huazhong University of Science and Technology<\/a>The agency published a<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [Multimodal Large Model] labels]\" target=\"_blank\" >Multimodal large model<\/a>A new comprehensive evaluation benchmark for LMMs aims to solve the problem of performance evaluation of large multimodal models. This study involves 14 mainstream large multimodal models, including Google Gemini, OpenAI GPT-4V, etc., covering five major tasks and 27 data sets. However, due to the open nature of the answers of large multimodal models, evaluating the performance of various aspects has become an urgent problem to be solved.<\/p>\n<p>In this study, special emphasis was placed on the capabilities of multimodal large models in optical character recognition (OCR). The research team conducted an in-depth study of the OCR performance of multimodal large models and built a dedicated evaluation benchmark named OCRBench for this purpose. Extensive experiments were conducted on 27 public datasets and 2 generated semantic-free and contrasting semantic datasets, revealing the limitations of multimodal large models in the field of OCR. The paper details the overview of the evaluation model, the metrics, and the evaluation datasets used.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3593\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/02\/6384249285036579456706956.jpg\" alt=\"\" width=\"812\" height=\"669\" \/><\/p>\n<p>Project address: https:\/\/github.com\/Yuliang-Liu\/MultimodalOCR<\/p>\n<p>Evaluation results show that multimodal large models perform well on some tasks, such as text recognition and document question answering. However, these models have certain challenges in terms of semantic dependencies, handwritten text, and multilingual text. In particular, the performance is poor when dealing with character combinations that lack semantics. The recognition of handwritten text and multilingual text also presents great challenges, which may be related to the lack of training data. In addition, high-resolution input images have better performance for some tasks, such as scene text question answering, document question answering, and key information extraction.<\/p>\n<p>To address these limitations, the research team built OCRBench to more accurately evaluate the OCR capabilities of multimodal large models. This initiative is expected to provide guidance for the future development of multimodal large models and prompt more improvements and research to further improve their performance and expand their application areas.<\/p>\n<p>In this new era of multimodal large model evaluation, the introduction of OCRBench provides researchers and developers with a more accurate and comprehensive tool to evaluate and improve the OCR capabilities of multimodal large models and promote the development of this field. This research not only provides new ideas for the performance evaluation of multimodal large models, but also lays a more solid foundation for research and application in related fields.<\/p>","protected":false},"excerpt":{"rendered":"<p>Recently, Huazhong University of Science and Technology (HUST) and other institutions released a new benchmark on comprehensive evaluation of multimodal large models (LMMs), aiming to address the problem of performance evaluation of multimodal large models. This study involves 14 mainstream multimodal large models, including Google Gemini, OpenAI GPT-4V, etc., covering five major tasks and 27 datasets. However, due to the open-ended nature of responses to multimodal big models, evaluating the performance of each aspect becomes a pressing issue. In this study, special emphasis is placed on the capabilities of multimodal macromodels for optical character recognition (OCR). The research team delved into the OCR performance of multimodal big models and constructed a specialized evaluation benchmark for this purpose, named OCRBench. by evaluating 27 public datasets and 2<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[603,602],"collection":[],"class_list":["post-3592","post","type-post","status-publish","format-standard","hentry","category-news","tag-603","tag-602"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3592","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=3592"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3592\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=3592"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=3592"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=3592"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=3592"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}