{"id":47847,"date":"2025-12-25T11:07:50","date_gmt":"2025-12-25T03:07:50","guid":{"rendered":"https:\/\/www.1ai.net\/?p=47847"},"modified":"2025-12-25T11:07:50","modified_gmt":"2025-12-25T03:07:50","slug":"%e9%98%bf%e9%87%8c%e9%80%9a%e4%b9%89-qwen3-tts-%e5%ae%b6%e6%97%8f%e4%b8%8a%e6%96%b0%e4%b8%a4%e6%ac%be-ai%e6%a8%a1%e5%9e%8b%ef%bc%9a%e5%a3%b0%e9%9f%b3%e4%b8%8d%e4%bb%85%e8%83%bd%e5%a4%8d%e5%88%b6","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/47847.html","title":{"rendered":"Two new AI models for the Alitunyi Qwen3-TTS family: sound not only replicates, but custom-made"},"content":{"rendered":"<p>The news of December 25th<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e9%80%9a%e4%b9%89\" title=\"[Sees articles with [Ariton] labels]\" target=\"_blank\" >Ali Tongyi<\/a>The Qwen3-TTS family launched two new articles<a href=\"https:\/\/www.1ai.net\/en\/tag\/ai%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [AI models]]\" target=\"_blank\" >AI Models<\/a>Sound creation model <strong>Qwen3-TTS-VD-Flash<\/strong>\u00a0And sound cloning models\u00a0<strong>Qwen3-TTS-VC-Flash<\/strong>I DON'T KNOW. 1AI WITH THE FOLLOWING MAIN FEATURES OF THE MODEL:<\/p>\n<ul>\n<li><strong>Sound Creation<\/strong>: Qwen3-TTS-VD-Flash supports the input of complex natural language commands, achieves fine-tuning of sound, rhythm, emotion, man-made, etc., achieves full control from \u201cwhat to say\u201d to \u201chow to say\u201d, frees users to define what they want, frees themselves from cloning only on the basis of the available sound, or only selects a fixed part of it. The combined performance was significantly better than that of GPT-4o-mini-ttts, Mimo-udio-7b-instruct, and exceeded Gemini-2.5-pro-pre-view-tts in role-playing tests\u3002<\/li>\n<li><strong>tone cloning<\/strong>: Qwen3-TTS-VC-Flash supports 3s-level acoustic cloning and can be based on cloned acoustics in the main languages of Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian, etc. In Mini Max TTS Multilingual Test Set, the average word error rate (WER) is generally better than Mini Max, Eleven Labs and GPT-4o-Audio-Preview\u3002<\/li>\n<li><strong>High performance<\/strong>: Qwen3-TTS-VD-Flash and Qwen3-TTS-VC-Flash have high-expressive, humanized acoustic color, capable of steadily and reliably exporting the speech content of the text that corresponds to the text and automatically adjusts the symmetrical rhythm to give a natural, live expression\u3002<\/li>\n<li><strong>Lu Bong's text skills<\/strong>: Qwen3-TTS-VD-Flash and Qwen3-TTS-VC-Flash have a strong text resolution capability that automatically processes complex text structures, extracts critical information with precision, and displays a greater degree of robustness in diverse, unorthodox text formats (Note: robustness, system ' s ability to maintain functional stability in the face of changes in its internal structure or external environment)\u3002<\/li>\n<\/ul>\n<p>Qwen3-TTS-VD-Flash<\/p>\n<p>Qwen3-TTS supports generation through natural language descriptions<strong>Customised Sound Image<\/strong>I don't know. Users are free to enter acoustic properties, descriptions, background information, etc. to easily create their desired voice image\u3002<\/p>\n<p>Controllable generation: Qwen3-TTS combined performance is significantly better than GPT-4o-mini-ttts, Mimo-udio-7b-instruct, and exceeds Gemini-2.5-pro-pre-view-ttts in role-play testing\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-47848\" title=\"3ea3cefdj00t7t1zp001jd000u09wp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/12\/3ea3cefdj00t7t1zp001jd000u0009wp.jpg\" alt=\"3ea3cefdj00t7t1zp001jd000u09wp\" width=\"1080\" height=\"356\" \/><\/p>\n<p>Qwen3-TTS-VC-Flash<\/p>\n<p>Qwen3-TTS supports pass<strong>natural 3s level sound cloning<\/strong>, and can generate multilingual audio based on cloned sound, with a high degree of rout for complex text and wild audio\u3002<\/p>\n<p>Multilingual sound cloning: Qwen3-TTS has a more stable content in Chinese, English, French, Italian, and other languages than MiniMax, ElevenLabs and GPT-4o-Audio-Preview; it has the highest average word error rate (WER)\u3002<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-47849\" title=\"d7c097b7j00t7t1zy0024d000u00096p\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/12\/d7c097b7j00t7t1zy0024d000u00096p.jpg\" alt=\"d7c097b7j00t7t1zy0024d000u00096p\" width=\"1080\" height=\"330\" \/><\/p>\n<p><strong>Qwen3-TTS-Voice-Design API document:<\/strong><\/p>\n<p>https:\/\/www.alibabacloud.com\/help\/zh\/model-studio\/qwen-tts-voice-design?spm=a2ty_o06.30285417.0.0.56a0c9216Ey6VM<\/p>\n<p><strong>Qwen3-TTS-Voice-Clone API document:<\/strong><\/p>\n<p>https:\/\/www.alibabacloud.com\/help\/zh\/model-studio\/qwen-tts-voice-cloning?spm=a2ty_o06.30285417.0.0.56a0c921WnHNlN<\/p>","protected":false},"excerpt":{"rendered":"<p>On 25 December, in a news release, the A.C.I. announced that the Qwen3-TTS family had launched two new AI models, the Qwen3-TTS-VD-Flash and the Qwen3-TTS-VC-Flash. The main features of the 1AI-plus model are as follows: Sound creation: Qwen3-TTS-VD-Flash supports the input of complex natural language commands, achieves precision regulation of sound, rhyme, emotion, man-made, etc., and achieves full control from \u201cwhat to say\u201d to \u201chow to say\u201d, frees users to define what they want, completely frees themselves from the fact that they can only be cloned on the basis of the available sound or only select a fixed part of the pre-set. InstractTTS-Eval<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[167,3390],"collection":[],"class_list":["post-47847","post","type-post","status-publish","format-standard","hentry","category-news","tag-ai","tag-3390"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/47847","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=47847"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/47847\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=47847"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=47847"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=47847"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=47847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}