Xiaomi's Sound Understanding Large Model MiDashengLM-7B Released and Open-Sourced in Full Volume, 22 Public Review Sets Refresh Best Scores

August 4 News.MilletSelf-researching large models for sound understanding MiDashengLM-The 7B was officially released today.full complementOpen Source.

Xiaomi's Sound Understanding Large Model MiDashengLM-7B Released and Open-Sourced in Full Volume, 22 Public Review Sets Refresh Best Scores

According to Xiaomi's official introduction, MiDashengLM-7B achieves double breakthroughs in speed and accuracy: the delay of the first Token of a single sample is only 1/4 of similar models, and the concurrency is more than 20 times under the same video memory.Setting a new multimodal large model best score on 22 public review sets (SOTA).

Based on Xiaomi Dasheng as an audio encoder and Qwen2.5-Omni-7B Thinker as an autoregressive decoder, MiDashengLM-7B achieves a unified understanding of speech, ambient sound, and music through an innovative generic audio description training strategy.

In 2024, Xiaomi released the Xiaomi Dasheng sound base model that broke the AudioSet 50+ mAP for the first time in the international arena, establishing a leading position in the three major fields of HEAR Benchmark ambient sound, voice, and music and maintaining it to this day.

Xiaomi Dasheng has more than 30 on-the-ground applications in Xiaomi's smart home and car cabin scenarios.The industry's first out-of-vehicle wake-up defense, mobile phone speaker 24/7 monitoring of abnormal sounds, and "a ringing finger" ambient sound correlation IoT control capability. The industry's first out-of-vehicle wake-up defense, cell phone speakers to monitor abnormal sounds around the clock, "a ringing finger" ambient sound associated with IoT control capabilities, as well as Xiaomi YU7 equipped with enhanced sentinel mode scratching detection, etc., behind Xiaomi Dasheng as the core algorithm of empowerment.

MiDashengLM's training data consists of 100% of publicly available data, and the model is released under the relaxed Apache License 2.0, which supports both academic and commercial applications.

Xiaomi says that unlike models such as Qwen2.5-Omni, which do not disclose details of their training data, theMiDashengLM fully discloses the detailed ratios of 77 data sourcesThe full process, from audio encoder pre-training to command fine-tuning, is detailed in the technical report.

As a key technology in Xiaomi's "human-car-home ecosystem" strategy, MiDashengLM can not only understand what is happening around the user, but also what is happening in the environment by unifying the cross-domain capabilities of understanding voice, ambient sound and music.It can also be analyzed to discover the hidden meanings of these things, improving the generalization of user scenario understanding.

MiDashengLM-based models provide more humanized communication and feedback through natural language and user interaction, such as providing feedback on pronunciation and formulating targeted enhancement programs when users are practicing singing or practicing a foreign language, or answering real-time questions about ambient sound when users are driving a vehicle.

MiDashengLM, with the Xiaomi Dasheng audio encoder as the core component, is an important upgrade to the Xiaomi Dasheng series of models. Based on the current version, Xiaomi has embarked on further upgrades to the computational efficiency of the model, theSeek offline deployment on end devices and improve more comprehensive features such as voice editing based on user's natural language prompts.

1AIttached MiDashengLM open source address:

  • GitHub homepage:https://github.com/xiaomi-research/dasheng-lm
  • Technical Report:: https://github.com/xiaomi-research/dasheng-lm/tree/main/technical_report
  • Model parameters (Hugging Face):https://huggingface.co/mispeech/midashenglm-7b
  • Model Parameters (Magic Hitch Community):https://modelscope.cn/models/midasheng/midashenglm-7b
  • web page Demo: https://xiaomi-research.github.io/dasheng-lm
  • each other Demo:https://huggingface.co/spaces/mispeech/MiDashengLM
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Open Source Big Model Scores New Record, Ali Tongyi Qwen3 Model Takes Third Place Worldwide

2025-8-4 11:34:37

Information

Tencent mixed yuan 0.5B, 1.8B, 4B, 7B model open source release, consumer graphics cards can be run

2025-8-4 20:09:18

Search