April 30th.DeepSeek YESTERDAY, A "FIBULATION MODEL" TEST WAS LAUNCHED, WHICH PARALLELS THE EXISTING "QUICK MODE" AND "EXPERT MODEL" WITH FULL MULTIMODULAR IMAGE UNDERSTANDING, NOT SIMPLE OCR TEXT RECOGNITION。

In real terms, DeepSeek is more accurate in general, and answers are available in half a second without opening the thinking mode. Common scenes such as film dramas, abstract pictures, and commodity maps are well identified and understood。
Of even greater interest is the process of thinking: beyond describing the content of the picture, the issuer’s identity, image metaphors and subtexts will be actively pursued, and self-corrected many times in the reasoning process, even before drawing conclusions, the spontaneous list of issues and the verification of the premise, which presents a logic of reasoning close to human reading habits。
However, there are still clear limitations to the mapping model. In the classic "numbering fingers" test, DeepSeek made a mistake for the first time, claiming that he was unconscious, but that he was able to give the right answer after the user had directed or hinted。
Moreover, the mapping process does not support web-based searches, relying on the model's own knowledge base, and is not able to identify relatively new things, such as Apple's mascot "Finder sauce" launched this year。
And just yesterday, DeepSeek Multi-Mode Team Researcher Xiaokang Chen wrote "now, we see you. " on X, with a map of DeepSeek's whale mascot from "black eyes" to "open eyes," which was widely interpreted as a warning that the new multi-mode model was coming online。