{"id":43517,"date":"2025-09-19T11:16:48","date_gmt":"2025-09-19T03:16:48","guid":{"rendered":"https:\/\/www.1ai.net\/?p=43517"},"modified":"2025-09-19T11:16:48","modified_gmt":"2025-09-19T03:16:48","slug":"%e5%b0%8f%e7%b1%b3%e5%bc%80%e6%ba%90%e9%a6%96%e4%b8%aa%e5%8e%9f%e7%94%9f%e7%ab%af%e5%88%b0%e7%ab%af%e8%af%ad%e9%9f%b3%e5%a4%a7%e6%a8%a1%e5%9e%8b-xiaomi-mimo-audio%ef%bc%8c%e5%af%b9%e8%af%9d%e8%87%aa","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/43517.html","title":{"rendered":"The first primary-to-end mega-speech model of Mimo-Audio, with a natural, interactive and humanist dialogue"},"content":{"rendered":"<p>The news of September 19th<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%b0%8f%e7%b1%b3\" title=\"[View articles tagged with [Xiaomi]]\" target=\"_blank\" >Millet<\/a>Announced today<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>First parent to end<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e9%9f%b3%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [Voice Megamodel]]\" target=\"_blank\" >Voice big model<\/a> It's not like it's a bad idea<strong>FOR THE FIRST TIME, ICL-BASED SMALL SAMPLES WERE GENERALIZED IN THE VOICE FIELD<\/strong>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-43518\" title=\"48e952c7j00t2tfq7005td000ufip\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/09\/48e952c7j00t2tfq7005td000u000fip.jpg\" alt=\"48e952c7j00t2tfq7005td000ufip\" width=\"1080\" height=\"558\" \/><\/p>\n<p>According to Mi, for the first time five years ago GPT-3 showed the ability to acquire In-Context Learning (ICL, context learning) through self-regressive language models + large-scale data-unspectled training, while in the area of voice<strong>Existing large models still rely heavily on large-scale labelling data<\/strong>,<strong>It's hard to adapt to new assignments to human intelligence<\/strong>.<\/p>\n<p>The Xiaomi-MiMo-Audio model, which breaks this bottleneck, is based on innovative pre-training structures and hundreds of hours of training data, and increases the ability to cross-modular alignment in terms of IQ, intelligence, performance and security<strong>Humanization of nature, emotional expression and interaction<\/strong>.<\/p>\n<p>The specific innovation points of the model are as follows:<\/p>\n<ul>\n<li>For the first time, Scaling to 100 million hours of pre-training in sound undamaged compression was shown to be \u201cemerging\u201d across the mission, in the form of Few-Shot Learning\u3002<\/li>\n<\/ul>\n<ul>\n<li>The first clear target and definition of voice generation pre-training and an open source set of full voice pre-training programmes, including tokenizer, a completely new model structure, training methods and assessment systems, are available\u3002<\/li>\n<\/ul>\n<p>At present, Mi has provided pre-training, command fine-tuning models for this model at the opening of the Huggingface platform, while at the Github platform, the Tokenizer model with parameters of 1.2B, based on Transformer architecture, supports audio reconstruction and audio-transtexting tasks\u3002<\/p>","protected":false},"excerpt":{"rendered":"<p>On September 19, Mi announced today that Xiaomi-MiMo-Audio, the first original-to-end voice model of open source, had for the first time achieved a generalization of ICL-based samples in the field of voice. According to Mi, five years ago GPT-3 demonstrated for the first time the ability to acquire In-Context Learning (ICL, context learning) through a self-returning language model + large-scale unmarked data training, while in the area of voice, the existing large-scale model still relies heavily on large-scale tagging data and is difficult to adapt to new assignments to human intelligence. And the Xiaomi-MiMo-Audio model broke this bottleneck, based on innovative pre-training structures and hundreds of hours of training data, in terms of IQ, intelligence, performance and safety<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1114,219,4061],"collection":[],"class_list":["post-43517","post","type-post","status-publish","format-standard","hentry","category-news","tag-1114","tag-219","tag-4061"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/43517","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=43517"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/43517\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=43517"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=43517"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=43517"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=43517"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}