May 8th news, yesterdayMillet AI LABORATORY RELEASE ANDOpen SourcepolyglotVoice cloning TTS Model OmniVoice, a team of 580,000 hours of training based on 50 open source data sets covering 646 languages。

The quality of Chinese and English synthesis is better than that of the dominant model, and the rate of reasoning is 40 times that in real time
In 24 languages, speech similarities and understandings go beyond multiple commercial systems
In 102 languages, understanding is close to real voice, and even small languages with less than 10 hours of training can be properly synthesized。
In addition to voice cloning, OmniVoice also supports the use of word descriptions to specify the sound (e.g. "Women, Young People, Sichuan language " ), which automatically filters the noise in the reference audio, and supports the insertion of speech symbols such as laughter, sighs and so forth, as well as the manual correction of polyphonic pronunciations。
💻 GitHub: github.com/k2-fsa/OmniVoice
Hugging Face: hugglingface.co/k2-fsa/OmniVoice