The news of November 25thTencent HunyuanToday announced the launch of the newOpen Source Model HunyuanOCRTHE PARAMETER IS ONLY 1B AND IS BASED ON A MULTI-MODULAR STRUCTURE BASED ON A MULTI-INDUSTRY OCR APPLICATION LIST SOTA (NOTE: STATE OF THE ART)。

According to official sources, thanks to the conceptual design of the MMA “end-to-end” concept, the functions of the HunyuanOCR are best achieved by a single forward reasoning。
THE OCR EXPERT MODEL IS BASED ON A MULTIMODULAR STRUCTURE CONSISTING OF THREE MAIN COMPONENTS:Native resolution video encoder, self-adapted visual adapter and light quantified hybrid language model.
Unlike other open-source OCR expert models or systems, the training and reasoning of the HunyuanOCR model is based on a whole-to-end paradigm, with robust end-to-end reasoning demonstrated through scaled-up application-oriented data combined with enhanced online learning。
The hybrid OCR has several core competencies that achieve SOTA effects, of which the OmniDocBench assessment of complex document resolution achieves the highest 94.1 pointsMore than Google's Gemini3-pro and so on; the word detection and recognition capability, in the benchmark of the self-built 9 major applications (documentation, art, street scene, handwritten, advertising, paper, screen-stopping, games, video), is a significant lead in the same open-source model and the commercial OCR model; on the OCRBench listTHE TOTAL SCORE WAS 860 POINTS, AND THE MODEL CONFIGURATION OF ONLY 1B TOTAL PARAMETER OBTAINED THE TOTAL PARAMETER, INCLUDING THE GENERAL VISUAL UNDERSTANDING MODEL 3B UNDER SOTA.
IN SMALL-LANGUAGE TRANSLATION SKILLS, MIXED OCR SUPPORTS 14 HIGH-FREQUENCY SMALL-LANGUAGE TRANSLATIONS INTO CHINESE OR ENGLISH AND HAS WON THE ICDAR2025-END-TO-END DOCUMENT TRANSLATION SMALL-MODEL CHAMPION。

In terms of applications, HunyuanOCR supports the resolution of complex documents in a multilingual language, with a combination of text detection and recognition capabilities, and applications in such settings as paper field extraction, video subtitle recognition, photo translation, etc。
In terms of text detection and recognition, models perform well on scenes such as documents, art words, street scenes, handwritten writings, advertising, bills, screens, games, videos, etc。
Complex document resolution refers to the electronicization of a multilingual document scanned or image taken, specifically, the organization of text elements that appear in a picture in the reading order, the use of the Latex formulae, and the presentation of complex tables in HTML format。
In addition to the usual applications, there is a need for field extraction, video subtitle extraction and photo translation。
1 interest fields for common cards and instruments (e.g. name / address / unit, etc.) are analysed in standard json format。
2. Automation of subtitles of videos, including bilingual subtitles。
3. Photo-translator function, which supports 14 small languages for high frequency applications, including: German, Spanish, Turkish, Italian, Russian, French, Portuguese, Arabic, Thai, Vietnamese, Indonesian, Malay, Japanese, Korean, and Chinese/English。
1AI with the following open source addresses:
- https://github.com/Tencent-Hunyuan/HunyuanOCR
- https://huggingface.co/tencent/HunyuanOCR
- Direct experience: https://huggingface.co/spaces/tencent/HunyuanOCR