Two new AI models for the Alitunyi Qwen3-TTS family: sound not only replicates, but custom-made

The news of December 25thAli TongyiThe Qwen3-TTS family launched two new articlesAI ModelsSound creation model Qwen3-TTS-VD-Flash And sound cloning models Qwen3-TTS-VC-FlashI DON'T KNOW. 1AI WITH THE FOLLOWING MAIN FEATURES OF THE MODEL:

  • Sound Creation: Qwen3-TTS-VD-Flash supports the input of complex natural language commands, achieves fine-tuning of sound, rhythm, emotion, man-made, etc., achieves full control from “what to say” to “how to say”, frees users to define what they want, frees themselves from cloning only on the basis of the available sound, or only selects a fixed part of it. The combined performance was significantly better than that of GPT-4o-mini-ttts, Mimo-udio-7b-instruct, and exceeded Gemini-2.5-pro-pre-view-tts in role-playing tests。
  • tone cloning: Qwen3-TTS-VC-Flash supports 3s-level acoustic cloning and can be based on cloned acoustics in the main languages of Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian, etc. In Mini Max TTS Multilingual Test Set, the average word error rate (WER) is generally better than Mini Max, Eleven Labs and GPT-4o-Audio-Preview。
  • High performance: Qwen3-TTS-VD-Flash and Qwen3-TTS-VC-Flash have high-expressive, humanized acoustic color, capable of steadily and reliably exporting the speech content of the text that corresponds to the text and automatically adjusts the symmetrical rhythm to give a natural, live expression。
  • Lu Bong's text skills: Qwen3-TTS-VD-Flash and Qwen3-TTS-VC-Flash have a strong text resolution capability that automatically processes complex text structures, extracts critical information with precision, and displays a greater degree of robustness in diverse, unorthodox text formats (Note: robustness, system ' s ability to maintain functional stability in the face of changes in its internal structure or external environment)。

Qwen3-TTS-VD-Flash

Qwen3-TTS supports generation through natural language descriptionsCustomised Sound ImageI don't know. Users are free to enter acoustic properties, descriptions, background information, etc. to easily create their desired voice image。

Controllable generation: Qwen3-TTS combined performance is significantly better than GPT-4o-mini-ttts, Mimo-udio-7b-instruct, and exceeds Gemini-2.5-pro-pre-view-ttts in role-play testing。

Two new AI models for the Alitunyi Qwen3-TTS family: sound not only replicates, but custom-made

Qwen3-TTS-VC-Flash

Qwen3-TTS supports passnatural 3s level sound cloning, and can generate multilingual audio based on cloned sound, with a high degree of rout for complex text and wild audio。

Multilingual sound cloning: Qwen3-TTS has a more stable content in Chinese, English, French, Italian, and other languages than MiniMax, ElevenLabs and GPT-4o-Audio-Preview; it has the highest average word error rate (WER)。

Two new AI models for the Alitunyi Qwen3-TTS family: sound not only replicates, but custom-made

Qwen3-TTS-Voice-Design API document:

https://www.alibabacloud.com/help/zh/model-studio/qwen-tts-voice-design?spm=a2ty_o06.30285417.0.0.56a0c9216Ey6VM

Qwen3-TTS-Voice-Clone API document:

https://www.alibabacloud.com/help/zh/model-studio/qwen-tts-voice-cloning?spm=a2ty_o06.30285417.0.0.56a0c921WnHNlN

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

BANK OF AMERICA CEO: ACCUMULATED EFFECTS OF AI INVESTMENT ARE VISIBLE, RISK MANAGEABLE AND GROWTH FASTER

2025-12-24 11:37:50

Information

OpenAI considers advertising in ChatGPT

2025-12-25 11:14:26

Search