Ali Thong Yi listens to launch a new version of the voice model: 3 seconds to "replicate" 9 languages, 18 dialects

December 16th.Tongyi Large ModelBy official public call, the two sections are “holy hearing”Voice ModelformalOpen Source, two models were upgraded. According to the introductionIt takes three seconds to get your voice seamlessly to change languages, dialects and emotions - Mandarin, Mandarin, Japanese, English, Happy, Anger 9 languages, 18 dialects, all done。

Ali Tun Yi has launched a new version of the voice model: 3 seconds to "replicate" 9 languages, 18 dialects

upgrade

Fun-CosyVoice3 Model Upgrade: First package delayed reduction of 50%, doubling the accuracy of Chinese and English and supporting 9 languages 18 dialects, translingual cloning and emotional control
Fun-ASR Model Capability Enhancements: Noise scene accuracy 93%, verbs and rap recognition, 31 free speech, accent coverage of dialects, and reduction of the first words of flow recognition models to 160 ms。

Open Source

Fun-CosyVoice 3(0.5B) open source: provide zero-shot acoustic cloning capability to support local deployment and secondary development
Fun-ASR-Nano (0.8B) open source: light quantitative version of Fun-ASR, lower cost of reasoning, model open source to support local deployment and customization fine-tuning。

1A has been officially informed that this time, the Fun-CosyVoice3 large model has completed several key upgrades:

(A) THE INITIAL PACKAGE DELAY REDUCTION OF 50%, WHICH SUPPORTS TWO-WAY-STREAM SYNTHESIS, AND THE REAL REALIZATION OF "INPUT-IN-SPEECH" FOR REAL-TIME SCENES SUCH AS VOICE ASSISTANTS, LIVE AUDIO-SPEECHING, BARRIER-FREE READING
56.41 TP3T, WHICH CONTAINS A COMBINATION OF PROFESSIONAL TERMS, SIZES AND WORDS, OR A TRANSLITERATED SENTENCE, CAN BE PRONUNCIATED WITH PRECISION AND NATURALITY
In the zero-shot TTS assessment, the consistency of content and the sound-synthesis increase in total and the complex scene (test-hard) character error rate (CER) decreases relatively by 26%, close to human recording levels
9 Universal languages, 18 Chinese dialects, 9 Emotional controls, with cross-linguistic transliteration -- recording in a Mandarin language gives you voice in Chinese, Japanese, English, etc., and the sound is consistent。

And open source Fun-CosyVoice 3-5B the model provides a zero-shot acoustic cloning capability that requires only reference audio for a period of three seconds or more to recaptulate the sound and synthesize the new voice, and to support local deployment and secondary development。

The Fun-ASR name allows AI to “understand”. It is based on tens of millions of hours of real voice data training and has landed on a large scale in scenes such as “AI hearing”, videoconferences, etc. Officially, the model focuses on optimizing the sound environment, multilingual free speech, Chinese dialect and accent cover, linguistic recognition, customization, and reducing the first words of the current identification model to 160 ms。

Fun-CosyVoice 3-5B Open source address:

https://github.com/FunAudioLMM/CosyVoice
https://funaudiolm.github.io/cosyvoice3/(GitHub.io)
https://www.modelscope.cn/studios/FunAudioLLM/Fun-CosyVoice3-0.5B (experience demo)
https://modelscope.cn/models/FunAudioLLM/Fun-CosyVoice3-0.5B-2512 (domestic model warehouse)
https://huggingface.co/FunAudioLLM/Fun-CosyVoice3-0.5B-2512 (Overshore Model Warehouse)

Fun-ASR-Nano-0.8B Open source address:

https://github.com/FunAudioLLM/Fun-ASR (GitHub)
https://funaudiolm.github.io/funasr/ (GitHub.io)
https://modelscope.cn/studios/FunAudioLLM/Fun-ASR-Nano/ (domestic experience demo)
https://huggingface.co/spaces/FunAudioLLM/Fun-ASR-Nano
https://modelscope.cn/models/FunAudioLLM/fun-asr-nano-2512 (domestic model warehouse)
https://huggingface.co/FunAudioLLM/Fun-ASR-Nano-2512 (Overshore Model Warehouse)

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Ali Tun Yi has launched a new version of the voice model: 3 seconds to "replicate" 9 languages, 18 dialects

MORGAN CHASE CEO: IN THE AI ERA, "SOFT SKILLS" ARE MORE IMPORTANT FOR EMPLOYMENT

TRUMP'S GOVERNMENT LAUNCHED THE AMERICAN SCIENCE AND TECHNOLOGY POWER PROGRAM TO RECRUIT THOUSANDS OF ENGINEERS TO WORK ON AI INFRASTRUCTURE

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

MORGAN CHASE CEO: IN THE AI ERA, "SOFT SKILLS" ARE MORE IMPORTANT FOR EMPLOYMENT

TRUMP'S GOVERNMENT LAUNCHED THE AMERICAN SCIENCE AND TECHNOLOGY POWER PROGRAM TO RECRUIT THOUSANDS OF ENGINEERS TO WORK ON AI INFRASTRUCTURE

Alibaba Cloud CTO Zhou Jingren: Tongyi open source model downloads exceed 20 million, firmly embrace open source

vivo releases self-developed AI Blue Heart big model and announces open source of 7B self-developed big model

All underlying codes will be open source next year! Telecom releases a large model with hundreds of billions of parameters, "Xingchen Semantics"

The first primary-to-end mega-speech model of Mimo-Audio, with a natural, interactive and humanist dialogue

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow