The news of September 19thMilletAnnounced todayOpen SourceFirst parent to endVoice big model It's not like it's a bad ideaFOR THE FIRST TIME, ICL-BASED SMALL SAMPLES WERE GENERALIZED IN THE VOICE FIELD.

According to Mi, for the first time five years ago GPT-3 showed the ability to acquire In-Context Learning (ICL, context learning) through self-regressive language models + large-scale data-unspectled training, while in the area of voiceExisting large models still rely heavily on large-scale labelling data,It's hard to adapt to new assignments to human intelligence.
The Xiaomi-MiMo-Audio model, which breaks this bottleneck, is based on innovative pre-training structures and hundreds of hours of training data, and increases the ability to cross-modular alignment in terms of IQ, intelligence, performance and securityHumanization of nature, emotional expression and interaction.
The specific innovation points of the model are as follows:
- For the first time, Scaling to 100 million hours of pre-training in sound undamaged compression was shown to be “emerging” across the mission, in the form of Few-Shot Learning。
- The first clear target and definition of voice generation pre-training and an open source set of full voice pre-training programmes, including tokenizer, a completely new model structure, training methods and assessment systems, are available。
At present, Mi has provided pre-training, command fine-tuning models for this model at the opening of the Huggingface platform, while at the Github platform, the Tokenizer model with parameters of 1.2B, based on Transformer architecture, supports audio reconstruction and audio-transtexting tasks。