AliCloud Tongyi open-sources first multimodal inference model QVQ, visual inference comparable to OpenAI o1

On December 25, AliCloud Tongyi Qianqian released the industry's first open source multimodal reasoning model, QVQ-72B-Preview.QVQ demonstrated super-expected visual comprehension and reasoning capabilities, and excelled in solving complex reasoning problems in the fields of mathematics, physics, and science. A number of evaluation data show that QVQ surpasses the previous visual understanding model Qwen2-VL, and the overall performance is comparable to OpenAI o1, Claude3.5 Sonnet and other reasoning models. Currently, developers can experience it directly on the Magic Ride community and HuggingFace platform. (36Krypton)

Search