Inworld AI releases Realtime TTS-2 voice model: sensed user sentiment, supporting 100 languages to keep the same voice

Inworld AI releases Realtime TTS-2 voice model: Perceptive user emotions, supporting 100 languages to keep the same voice

May 7th news, yesterdayInworld AI LAUNCH A NEW GENERATIONVoice Model Realtime TTS-2, open to developers through Inworld API and Inworld Realtime API in the form of a research preview。

Inworld AI releases Realtime TTS-2 voice model: Perceptive user emotions, supporting 100 languages to keep the same voice

THE CORE CHANGE OF TTS-2 IS THE SHIFT FROM A ONE-WAY TEXT TO A CLOSED-RINGED REAL-TIME DIALOGUE STRUCTURE: THE MODEL DIRECTLY RECEIVES THE ACTUAL AUDIO IN THE DIALOGUE, THEREBY UNDERSTANDING THE TONE, RHYTHM AND EMOTIONAL STATE OF THE USER AND ADJUSTING IT ACCORDINGLY. FOUR NEW CAPABILITIES WERE ADDED TO THE NEW VERSION:

Voice Direction: The model adjusts the voice style by describing it in a natural language, such as "tired but gentle, like home from work"

Dialogueal Awareness: auto-receiving pre-sequencing audio in Realtime sessions, and the tone and rhythm can continue in turn

Cross-language support: single voice identity can be seamlessly switched between more than 100 languages, and voice lines are consistent with the character of the person, supporting multilingualism in the generation of the same paragraph

Advanced voice design (Advanced Voice Design): Without reference to audio, reusable sound roles can be generated by text description, and three modes of "stable" "stable" are provided for "activity "。

IN ADDITION, THE MODEL SUPPORTS INLINE NON-LINGUISTIC TAGS (E.G. [ LAUGH] [SIGHS)]), VOICE CLONING (JUST UPLOAD A 5 TO 15 SECOND AUDIO SAMPLE) AND A DELAY OF LESS THAN 200 MS IN THE INITIAL TTS LAYER。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Inworld AI releases Realtime TTS-2 voice model: Perceptive user emotions, supporting 100 languages to keep the same voice

Anthropic reached a calculator agreement with SpaceX and won over $220,000 in British Wyda GPU

OpenAI releases three real-time voice models

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Anthropic reached a calculator agreement with SpaceX and won over $220,000 in British Wyda GPU

OpenAI releases three real-time voice models

Claiming to be better than XTTS! VoiceCraft: A voice model that supports voice cloning and modifying original audio text

Wisdom Spectrum Clear Speech Launches Emotional Speech Model GLM-4-Voice: Understanding Emotions, Emotional Expression and Empathy

MiniMax Halo Speech AI Product Launched: Supports 17 Languages and Up to 10,000 Characters

Microsoft launches its first self-developed AI model: MAI-Voice-1 generates audio in seconds, MAI-1-preview points to Copilot text scenes

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow