On October 21st, according to Al Quilqi's reportDeepSeek The team has recently released a new studyOCR, proposes a "text-based optical compression" approach, which provides groundbreaking thinking for long text processing for large models。

research shows that by rendering long text into images and then turning to visual token, it is possible to significantly reduce the calculation costs while maintaining high accuracy。
Experimental data show that the OCR decoded accuracy rate was as high as 971 TP3T at a rate of less than 10 times; even at a rate of 20 times higher, the accuracy rate remained at about 601 TP3T. On the authoritative document parsing baseline OmniDocBench, the model goes beyond several mainstream SOTA models with less visual token。
IN PRACTICAL APPLICATIONS, SINGLE-DESK A100-40G GPU CAN PROCESS OVER 200,000 PAGES OF DOCUMENTS PER DAY, PROVIDING BIG DATA SUPPORT FOR LARGE MODEL TRAINING。
Currently, the relevant code and model weights are on the GitHub and Hugging Face platformOpen Source.
💻 GitHub: https://github.com/deepseek-ai/DeepSeek-OCR
Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-OCR