First-to-end full-state AI model Qwen3-Omni release open source, text, image, audio and video, all harmonized

The news of September 23rd, a familiar late night, is that it is the first time in the worldAlibaba CloudPublished todayOpen SourceNew Qwen3-Omni, Qwen3-TTS and Qwen-Image-Edit-2509 for Google Nano Banana image editing tool。

First-to-end full-state AI model Qwen3-Omni release open source, text, image, audio and video, all harmonized

Qwen3-Omni is the first original-to-end full-state AI model of the industry that can handle multiple types of input of text, images, audio and video, and solves the long-standing problem of multi-modular models requiring trade-offs between different capabilities through real-time output of text and natural voice。

Qwen3-Omni is the original-to-end multi-linguistic all-modular basic model, the core characteristics of which include, inter alia:

  • Advanced cross-model performance: Models have original multimodular capability through early text-centred pre-training and mixed multimodular training. The single-modular text and image effects are maintained while achieving robust audio and video performance. Of the 36 audio/video reference tests, 22 reached the latest level, of which 32 were in the lead in the open source range; they were comparable to Gemini 2.5 Pro in automatic voice recognition (ASR), audio understanding and voice dialogue。
  • Multilingualism: 119 text languages, 19 voice input languages and 10 voice output languages are supported。
  • Voice input languages: English, Chinese, Korean, Japanese, German, Russian, Italian, French, Spanish, Portuguese, Malay, Dutch, Indonesian, Turkish, Vietnamese, Chinese, Arabic, Urdu。
  • Voice output languages: English, Chinese, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean。
  • Innovation architecture: MoE (mixed expert) based “thinker-expressor” design, combined with Aut pre-training to acquire a strong generic representational capability, with multicode design to minimize delays。
  • Real-time audio/video interaction: low-delayed stream interaction, supporting natural rotational dialogue and instant text or voice response。
  • Flexible control: fine-particle control and ease of adaptation can be achieved through the system hint self-defined behaviour。
  • Fine audio description: Qwen3-Omni-30B-A3B-Captioner is an open source, a generic, detailed, low hallucinogenic audio description model that fills a gap in open source communities in this area。

1AI Attach official address:

  • GitHub: https://github.com/QwenLM/Qwen3-Omni
  • Face: https://huggingface.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe
  • Qwen3-Omni-867aef131e7d4f
  • Demo: https://huggingface.co/spaces/Qwen/Qwen3-Omni-Demo

TTS, WHICH IS A TEXT TRANSLITERATION VOICE, ARIYUN RELEASED TTS, WHICH SUPPORTS 17 SOUND SELECTIONS, EACH OF WHICH SUPPORTS 10 LANGUAGES. IT INCLUDES NOT ONLY MANDARIN, ENGLISH, FRENCH, GERMAN, RUSSIAN, ITALIAN, SPANISH, PORTUGUESE, JAPANESE AND KOREAN, BUT ALSO MORE CHINESE DIALECTS: GUANNAN, WU, YI, SICHUAN, BEIJING, NANJING, TIANJIN AND XINXI。

In addition, Qwen3-TTS-Flash achieved SoTA performance on a number of evaluation benchmarks, going beyond SeedTTS, MiniMax, GPT-4o-Audio-Preview, and Elevenlabs, especially for voice stability and sound-colour similarities。

Delayed comparison Qwen3-TTS-Flash Qwen-TTS
Rounded Double card 12 Parallel Double Card 6 Parallel
First package delay (single) 97 ms 200 ms
First package delay (full hand-out) 420 ms 733ms
First package size (full and bigger) 320 ms 190 ms
RTF (SINGLE) 0.30 0.43
RTF (SINGLE) 0.51 0.72

Official Address:

  • Demo: https://huggingface.co/spaces/Qwen/Qwen3-TTS-Demo
  • blogger: https://qwen.ai/blog?id=b4264e11fb80b5e373550790121baf0a0f10daf82&;amp;amp;amp;amp;amp;fom=research.latest-advancements-list

Qwen-Image-Edit-2509 is a monthly, iterative, upgraded version of Qwen-Image, which, like the dream 4.0 image model released a few days before the byte, is mainly a significant increase in consistency。

The main improvements of Qwen-Image-Edit-2509 compared to the Qwen-Image-Edit released in August include:

  • Multi-image editing support: For multi-image input, Qwen-Image-Edit-2509 is based on Qwen-Image-Edit architecture and is further trained through image fusion to achieve multi-image editing. It supports various combinations, such as “people + people”, “people + products” and “people + scenes”. Currently perform best when 1 to 3 images are entered。
  • Increased single image consistency: For single image input, Qwen-Image-Edit-2509 significantly improved editorial consistency, particularly in the following areas:
  • Improved consistency of people like editors: better preservation of facial identity and support for various portrait styles and posture changes
  • Improved product editorial consistency: better preservation of product identity and support for product poster editing
  • Improved editorial consistency: in addition to changes in text content, support for editing text fonts, colours and materials
  • Prototype support Contronet: includes depth maps, edge maps, key points, etc。

Official Address:

  • blog: https://qwen.ai/blog?id=7a9090115ee193ce6a7f7195271d96dd93&;amp;amp;amp;amp;fom=research.latest-advancements-list
  • Queen-Image-Edit-2509
  • Embrace: https://huggingface.co/Qwen/Qwen-Image-Edit-2509
  • GitHub: https://github.com/QwenLM/Qwen-Image

Also, Qwen3-Next-80B-A3B-Instract-FP8 and Qwen3-Next-80B-A3B-Thinking-FP8 have been opened:

  • Face: https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eea9d
  • Qwen3-Next-c314f23bd0264a
statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Byte beat to launch a large translation model for bean bags: support 28 languages for translation, performance versus shoulder GPT-4o

2025-9-22 19:21:22

Information

Young Weeda and OpenAI announced a strategic partnership to invest hundreds of billions of dollars in deploying 10 GW of calculus

2025-9-23 11:54:26

Search