Industry First: 8B Parametric Faceplate MiniCPM-V 4.5 Open-Source, "The Strongest End-Side Multimodal Model"

August 27th.Wall-facing intelligence Announced on August 26Open Source MiniCPM-V 4.5 multimodal flagship model with 8B parameters, becoming the industry's first "high-brush" video comprehension capability.Multimodal Model.

Industry First: 8B Parametric Faceplate MiniCPM-V 4.5 Open-Source, "The Strongest End-Side Multimodal Model"

MiniCPM-V 4.5 is claimed to be the "strongest end-side multimodal model" with the same level of SOTA in high brushed video comprehension, long video comprehension, OCR, and document parsing, and the performance exceeds that of Qwen2.5-VL 72B.

Facing Wall Intelligence said that the mainstream multimodal model usually adopts 1 fps frame extraction, i.e., it can only capture 1 frame per second for recognition and understanding, because of balancing arithmetic power, power consumption and other factors in dealing with video comprehension. Although this ensures the inference efficiency of the model to a certain extent, most of the visual information is missing, which reduces the multimodal model's "fine-grained" understanding of the dynamic world.

MiniCPM-V 4.5 is the industry's first multimodal model with high-brush video comprehension capability. By expanding the model structure from 2D-Resampler to 3D-Resampler, it performs high-density compression of 3D video clips, and receives up to 6 times the maximum number of video frames with the same visual token volume overhead, achieving a 96x visual compression rate, 12-24 times higher than similar models. 12-24 times that of similar models.

MiniCPM-V 4.5 significantly increases the frequency of frame extraction, from watching "PowerPoint" to understanding "motion picture". In the face of flickering images, MiniCPM-V 4.5 can see more accurately and in greater detail than representative cloud models such as Gemini-2.5-Pro, GPT-5, GPT-4o, and so on.

In the MotionBench and FavorBench lists, which reflect the comprehension of high-brush video, MiniCPM-V 4.5 reaches the same-size SOTA and exceeds Qwen2.5-VL 72B, realizing a class-leading level.

With 8B parameters, MiniCPM-V 4.5 once again refreshes the upper limit of capability in multimodal capabilities such as image understanding, video understanding, and complex document recognition.

In terms of image understanding performance, MiniCPM-V 4.5 is ahead of many closed-source models such as GPT-4o, GPT-4.1, Gemini-2.0-Pro, etc., and even outperforms Qwen2.5-VL 72B in the OpenCompass measurements, realizing a leap ahead.

In terms of video comprehension performance, MiniCPM-V 4.5 achieves best-in-class in the LVBench, MLVU, Video-MME, LongVideoBench and other lists.

In the complex document recognition task, MiniCPM-V 4.5 achieves the same level of SOTA performance as the general multimodal model in the OmniDocBench list for OverallEdit, TextEdit and TableEdit.

In addition, MiniCPM-V 4.5 supports both regular and deep thinking modes to achieve a balance between performance and responsiveness, with regular mode providing excellent multimodal understanding in most scenarios, and deep thinking mode focusing on responding to complex and compound reasoning tasks.

In VideoMME, a video comprehension list, and OpenCompass, a single-image test, MiniCPM-V 4.5 reaches the SOTA level of its class, and achieves the lead in terms of video memory usage and average inference time.

On Video-MME, a video comprehension test set covering short, medium, and long video types, MiniCPM-V 4.5 employs a 3-frame packing strategy for inference, with a time overhead (not counting the model's pumping time) that is only 1/10 of that of comparable models.

1AI Attached model open source link:

  • Github: https://github.com/OpenBMB/MiniCPM-o
  • Hugging Face: https://huggingface.co/openbmb/MiniCPM-V-4_5
  • ModelScope: https://www.modelscope.cn/models/OpenBMB/MiniCPM-V-4_5
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

The United Nations General Assembly decides to establish an independent international scientific panel on artificial intelligence and a global dialogue mechanism on artificial intelligence governance.

2025-8-28 11:12:44

Information

Tencent's Hunyuan open-sources end-to-end AI model Hunyuan-Foley: video + text = "cinematic" sound effects

2025-8-28 18:20:05

Search