{"id":46447,"date":"2025-11-25T19:03:14","date_gmt":"2025-11-25T11:03:14","guid":{"rendered":"https:\/\/www.1ai.net\/?p=46447"},"modified":"2025-11-25T19:03:14","modified_gmt":"2025-11-25T11:03:14","slug":"%e8%85%be%e8%ae%af%e6%b7%b7%e5%85%83-ocr-%e6%a8%a1%e5%9e%8b%e5%ae%a3%e5%b8%83%e5%bc%80%e6%ba%90%ef%bc%9a%e5%8f%82%e6%95%b0%e4%bb%85-1b%ef%bc%8c%e5%a4%9a%e9%a1%b9%e6%a0%b8%e5%bf%83%e8%83%bd%e5%8a%9b-so","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/46447.html","title":{"rendered":"OCR MODEL DECLARED OPEN SOURCE: PARAMETER 1B, MULTIPLE CORE CAPABILITIES SOTA"},"content":{"rendered":"<p class=\"translation-text-wrapper\" data-ries-data-process=\"44\" data-group-id=\"group-44\">The news of November 25th<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%85%be%e8%ae%af%e6%b7%b7%e5%85%83\" title=\"[View articles tagged with [Tencent Hybrid]]\" target=\"_blank\" >Tencent Hunyuan<\/a>Today announced the launch of the new<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90%e6%a8%a1%e5%9e%8b\" title=\"[See articles with [open source model] labels]\" target=\"_blank\" >Open Source Model<\/a>\u00a0<strong>HunyuanOCR<\/strong>THE PARAMETER IS ONLY 1B AND IS BASED ON A MULTI-MODULAR STRUCTURE BASED ON A MULTI-INDUSTRY OCR APPLICATION LIST SOTA (NOTE: STATE OF THE ART)\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-46448\" title=\"3ad98f4fj00t6a3zm000td000uf9m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/11\/3ad98f4fj00t6a3zm000td000u000f9m.jpg\" alt=\"3ad98f4fj00t6a3zm000td000uf9m\" width=\"1080\" height=\"549\" \/><\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"45\" data-group-id=\"group-45\">According to official sources, thanks to the conceptual design of the MMA \u201cend-to-end\u201d concept, the functions of the HunyuanOCR are best achieved by a single forward reasoning\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"46\" data-group-id=\"group-46\">THE OCR EXPERT MODEL IS BASED ON A MULTIMODULAR STRUCTURE CONSISTING OF THREE MAIN COMPONENTS:<strong>Native resolution video encoder, self-adapted visual adapter and light quantified hybrid language model<\/strong>.<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"47\" data-group-id=\"group-47\">Unlike other open-source OCR expert models or systems, the training and reasoning of the HunyuanOCR model is based on a whole-to-end paradigm, with robust end-to-end reasoning demonstrated through scaled-up application-oriented data combined with enhanced online learning\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"48\" data-group-id=\"group-48\">The hybrid OCR has several core competencies that achieve SOTA effects, of which the OmniDocBench assessment of complex document resolution achieves the highest 94.1 points<strong>More than Google's Gemini3-pro and so on<\/strong>; the word detection and recognition capability, in the benchmark of the self-built 9 major applications (documentation, art, street scene, handwritten, advertising, paper, screen-stopping, games, video), is a significant lead in the same open-source model and the commercial OCR model; on the OCRBench list<strong>THE TOTAL SCORE WAS 860 POINTS, AND THE MODEL CONFIGURATION OF ONLY 1B TOTAL PARAMETER OBTAINED THE TOTAL PARAMETER, INCLUDING THE GENERAL VISUAL UNDERSTANDING MODEL 3B UNDER SOTA<\/strong>.<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"49\" data-group-id=\"group-49\">IN SMALL-LANGUAGE TRANSLATION SKILLS, MIXED OCR SUPPORTS 14 HIGH-FREQUENCY SMALL-LANGUAGE TRANSLATIONS INTO CHINESE OR ENGLISH AND HAS WON THE ICDAR2025-END-TO-END DOCUMENT TRANSLATION SMALL-MODEL CHAMPION\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-46449\" title=\"0c6983dfj00t6a3zm002td000u000m5m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/11\/0c6983dfj00t6a3zm002td000u000m5m.jpg\" alt=\"0c6983dfj00t6a3zm002td000u000m5m\" width=\"1080\" height=\"797\" \/><\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"50\" data-group-id=\"group-50\">In terms of applications, HunyuanOCR supports the resolution of complex documents in a multilingual language, with a combination of text detection and recognition capabilities, and applications in such settings as paper field extraction, video subtitle recognition, photo translation, etc\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"51\" data-group-id=\"group-51\">In terms of text detection and recognition, models perform well on scenes such as documents, art words, street scenes, handwritten writings, advertising, bills, screens, games, videos, etc\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"52\" data-group-id=\"group-52\">Complex document resolution refers to the electronicization of a multilingual document scanned or image taken, specifically, the organization of text elements that appear in a picture in the reading order, the use of the Latex formulae, and the presentation of complex tables in HTML format\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"53\" data-group-id=\"group-53\">In addition to the usual applications, there is a need for field extraction, video subtitle extraction and photo translation\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"54\" data-group-id=\"group-54\">1 interest fields for common cards and instruments (e.g. name \/ address \/ unit, etc.) are analysed in standard json format\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"55\" data-group-id=\"group-55\">2. Automation of subtitles of videos, including bilingual subtitles\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"56\" data-group-id=\"group-56\">3. Photo-translator function, which supports 14 small languages for high frequency applications, including: German, Spanish, Turkish, Italian, Russian, French, Portuguese, Arabic, Thai, Vietnamese, Indonesian, Malay, Japanese, Korean, and Chinese\/English\u3002<\/p>\n<p class=\"translation-text-wrapper\" data-ries-data-process=\"57\" data-group-id=\"group-57\">1AI with the following open source addresses:<\/p>\n<ul>\n<li>https:\/\/github.com\/Tencent-Hunyuan\/HunyuanOCR<\/li>\n<li>https:\/\/huggingface.co\/tencent\/HunyuanOCR<\/li>\n<li class=\"translation-text-wrapper\" data-ries-data-process=\"58\" data-group-id=\"group-58\">Direct experience: https:\/\/huggingface.co\/spaces\/tencent\/HunyuanOCR<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>On November 25th, a new open source model, the HunyuanOCR, was announced today, with only 1B, based on a multi-modular structure of the hybrids, with many industry OCR application lists SOTA (note: state of the art). According to official sources, thanks to the conceptual design of the Magnificent Multi-Mode Model \u201cend-to-end\u201d philosophies, the functions of the HunyuanOCR can achieve optimal results only if it is based on a single forward reasoning. The Mixtures OCR Expert Model is based on the Mixtures Multimodular Structure and consists of three main components: a raw resolution video encoder, a self-adapted visual adaptor and a light-quantitative hybrid language model. Unlike other open-source OCR expert models or systems, HunyuanOCR<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[862,2657],"collection":[],"class_list":["post-46447","post","type-post","status-publish","format-standard","hentry","category-news","tag-862","tag-2657"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/46447","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=46447"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/46447\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=46447"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=46447"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=46447"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=46447"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}