Podcasting tool: Microsoft open source VibeVoice-1.5B audio model, support for Chinese, can generate 90 minutes of 4-person chat voice

Podcasting tool: Microsoft open source VibeVoice-1.5B audio model, support for Chinese, can generate 90-minute 4-person chat voice

August 27, 2012 - Technology media outlet marktechpost published a blog post on August 25, reporting thatMicrosoftreleaseOpen SourceText-to-speech (TTS) model VibeVoice-1.5B.Generate up to 90 minutes of natural speech from up to 4 different speakers at once, with support for cross-language and song synthesis.

Podcasting tool: Microsoft open source VibeVoice-1.5B audio model, support for Chinese, can generate 90-minute 4-person chat voice

In terms of architecture, VibeVoice-1.5B is based on the Qwen2.5 language model with 1.5B parameters, combining an Acoustic and Semantic Tokenizer, and processed at a low frame rate of 7.5Hz.

The acoustic lexicon uses a σ-VAE structure to compress the 24kHz raw audio to one part in 3200, while the semantic lexicon is trained by a speech recognition agent task to preserve dialog semantics. The decoding side uses a 123 million parameter diffusion decoder combined with a classifier free bootstrap and DPM-Solver to improve sound quality and detail.

The model gradually expands the context length from 4k to 65k tokens during training to ensure speech coherence and speaker consistency in long conversations, and its architecture supports multi-speaker turn-taking to simulate natural conversation scenarios, and it can generate long audio in streaming mode, laying the foundation for future real-time TTS.

VibeVoice-1.5B also has limitations, currently only supports English and Chinese, other languages may appear inaccurate or inappropriate content; does not support the speaker's voice overlap, and can not generate background sound effects or music. Microsoft explicitly prohibits the use of the model for voice impersonation, disinformation, or bypassing authentication, and reminds users to comply with the law and identify the source of AI generation.

Microsoft says the model is aimed at the research and developer community and is suitable forInternet audio subscription serviceproduction, conversational AI, speech content generation and other fields. In the future, the 7B version with larger parameters will be released to support low-latency interactions and higher fidelity real-time synthesis, further expanding the application scenarios.

1AI Attach reference address

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Podcasting tool: Microsoft open source VibeVoice-1.5B audio model, support for Chinese, can generate 90-minute 4-person chat voice

A picture can generate cinematic digital human video: AliCloud Tongyi Wan2.2-S2V video generation model announced open source

Google Gemini 2.5 Flash Upgrades AI Retouching Features, Outperforms GPT-4o in Several Ways

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

A picture can generate cinematic digital human video: AliCloud Tongyi Wan2.2-S2V video generation model announced open source

Google Gemini 2.5 Flash Upgrades AI Retouching Features, Outperforms GPT-4o in Several Ways

Microsoft open source multimodal AI Agent "Magma": shopping can automatically order, but also predict the behavior of video characters

Microsoft to Open Source GitHub Copilot Chat Extension, Push VS Code into Open Source AI Editor

Microsoft Open Source Releases Athena Intelligence: AI Reinvents Teams Workflow, Code PR Reviews Up to 58%

Microsoft Open Sources GitHub Copilot Chat Extension for VS Code to Help Automate AI Programming

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow