DeepSeek announced the open source DeepSeek-VL2 model, which has obvious advantages in various evaluation indexes, and officially said that the visual model has stepped into the era of Mixed Expert Model (MoE). The model has many highlights, the data is more than double the generation of high-quality training data, but also the introduction of new capabilities such as terrier map understanding; the architecture of the visual part of the cut map strategy to support the dynamic resolution of the image, the language part of the MoE architecture to achieve low-cost and high-performance; training inherited the three-phase training process, and through a variety of parallel strategies to adapt to the difficulties of realizing high-efficiency training. In addition, DeepSeek-VL2 model supports dynamic resolution, supports up to specific resolution and extreme aspect ratio by using cutout and thumbnail strategies to adapt to more scenarios, and is able to understand scientific diagrams and generate Python code based on images by learning more scientific document data.
Hugging Face: https://huggingface.co/collections/deepseek-ai/deepseek-vl2-675c22accc456d3beb4613abGitHub
Address: https://github.com/deepseek-ai/DeepSeek-VL2
