Ali Tongyi Thousand Questions 2.5-Omni-3B AI Full Modal Debut: 90% Performance in 7B, 53% Less Video Memory Usage

May 1 News.Alibabatry to make a sustained effort AI The field, whose Qwen team released the Qwen2.5-Omni-7B model in March, re-released Qwen2.5-Omni-3B yesterday (April 30), which is now available for open download on Hugging Face.

Ali Tongyi Thousand Questions 2.5-Omni-3B AI Full Modal Debut: 90% Performance in 7B, 53% Less Video Memory Usage

Note: This 3B parametric model is a lighter version of its 7B flagship multimodal model, designed for consumer-grade hardware, covering a wide range of input functions such as text, audio, image and video.

The team said that despite the reduced size of the parametersThe 3B version maintains the multimodal performance of the 7B model above 90%., especially shining in real-time text generation and natural speech output.

Benchmarks show that it approaches the 7B model level in tasks such as video comprehension (VideoBench: 68.8) and speech generation (Seed-tts-eval test-hard: 92.1).Ali Tongyi Thousand Questions 2.5-Omni-3B AI Full Modal Debut: 90% Performance in 7B, 53% Less Video Memory Usage

Qwen2.5-Omni-3B's improvements in memory usage are particularly noteworthy. The team reports that when processing long contextual inputs of 25,000 tokens, theThe model's VRAM footprint decreased by 53% to 28.2 GB from 60.2 GB in the 7B model.

This means that the model can run on 24GB GPUs without the need for enterprise-class GPU cluster support and can run on high-end desktops and laptops.

Its architectural innovations, such as the Thinker-Talker design and the customized positional embedding method, TMRoPE, ensure simultaneous comprehension of video and audio inputs. In addition, the model supports FlashAttention 2 and BF16 precision optimization to further increase speed and reduce memory consumption.

The use of Qwen2.5-Omni-3B is strictly limited. Under the terms of the license, the model is restricted to research use only, and companies wishing to develop commercial products must first obtain a separate license from the Alibaba Qwen team, meaning that the model is not directly deployed in production and is positioned more towards testing and prototyping.

refer to

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Creating AI images based on 3D scenes: NVIDIA's new tool opens up with RTX 4080 configuration requirements

2025-4-30 22:30:51

Information

DeepSeek-Prover-V2: The New King of AI Mathematical Reasoning, 88.9% Pass Rate Sets a New Benchmark

2025-5-1 13:45:21

Search