All Tags

Voice Model

OpenAI releases three real-time voice models

On May 8th, OpenAI released three real-time voice models for voice reasoning, real-time translation and streaming: GPT-Realtime-2: building voice intelligence for the production environment. (b) GPT-Realtime-Translate: supports real-time translation in more than 70 input and 13 output languages to break language barriers and help people communicate more naturally
Information
- 1.9k
5/8
Inworld AI releases Realtime TTS-2 voice model: Perceptive user emotions, supporting 100 languages to keep the same voice

News from May 7, yesterday, Inworld AI launched a new generation of voice models Realtime TTS-2, which are open to developers through Inworld API and Inworld Realtime API in the form of research previews. The core change of TTS-2 is the shift from a one-way text to a closed-ringed real-time dialogue structure: the model directly receives the actual audio in the conversation, thereby understanding the tone, rhythm and emotional state of the user and adjusting it accordingly. New version with four new capabilities: Voice D..
Information
- 3.2k
5/7
Ali Tun Yi has launched a new version of the voice model: 3 seconds to "replicate" 9 languages, 18 dialects

On December 16, in a message, the Master Model was announced by the official public sign, two “hundred-hear” voice models were officially opened and two models were upgraded. According to the introduction, it takes three seconds to get your voice seamlessly transacted in languages, dialects and emotions -- Mandarin, Chinese, Japanese, English, happy, angry nine languages, 18 dialects. Upgrade Fun-CosyVoice3 Model Upgrade: First package delayed reduction of 50%, doubling the accuracy of Chinese and English and supporting 9 languages 18 dialects, translingual cloning and emotional control; ..
Information
- 7.6k
25/12/16
FACEWALL SMART RELEASE 0.5B ARGUMENT VOICE MODEL, SOUND ECHOES HUMAN

On September 19th, yesterday afternoon, the smart face wall announced a new series of "Small Steel Guns": the VoxCPM base model for voice generation of 0.5B parameter sizes was introduced. The VoxCPM base model for voice generation was officially launched at the Human Voice Interactive Laboratory of the International Graduate School of Shenzhen, Singhua University. The model parameter size is 0.5B, with an industry SOTA level in terms of voice naturality, sound similarity and rhythm performance. Performance performance: RTF ≈ 0.17, support current output VoxCPM in Seed-TTS-EV..
Information
- 5.3k
25/9/19
Microsoft launches its first self-developed AI model: MAI-Voice-1 generates audio in seconds, MAI-1-preview points to Copilot text scenes

On Thursday, August 29, Microsoft's Artificial Intelligence division officially launched its first two homegrown AI models -- the MAI-Voice-1 voice model and the MAI-1-preview general-purpose model. According to Microsoft, the new MAI-Voice-1 voice model requires only a single GPU to generate a minute-long audio in less than a second, while the MAI-1-preview model "gives users a glimpse of Copilot's future functionality". Currently, Microsoft has made the MAI-Voice-...
Information
- 2.1k
25/8/29
OpenAI Releases New Generation of Speech Models to Enable AI Intelligents to Speak More Naturally

March 21 news, OpenAI yesterday (March 20) released a blog post, announcing the launch of speech-to-text (speech-to-text) and text-to-speech (text-to-speech) models, to improve voice processing capabilities, support developers to build more accurate, customizable voice interaction system, and further promote the commercialization of AI voice technology applications. In terms of speech-to-text models, OpenAI has launched gpt-4o-transcribe and gpt-4o-mini-transcribe...
Information
- 4.2k
25/3/21
MiniMax Halo Speech AI Product Launched: Supports 17 Languages and Up to 10,000 Characters

January 21st, MiniMax announced yesterday that it has brought the newly upgraded T2A-01 series of voice models and launched Conch AI products globally. According to the introduction, relying on the T2A-01 series of voice models, users can generate natural and smooth super humanoid voices by inputting text in Conch AI, and the maximum length of input can be up to 10,000 characters. At the same time, users can freely configure the mood, speech rate, pitch, and even adjust the timbre effect of the output voice to meet the refined needs of complex scenarios. 1AI notes that Conch Voice supports Chinese,...
Information
- 5.5k
25/1/21
Wisdom Spectrum Clear Speech Launches Emotional Speech Model GLM-4-Voice: Understanding Emotions, Emotional Expression and Empathy

Wisdom Spectrum announced the launch of GLM-4-Voice end-to-end emotional voice model. Officially, GLM-4-Voice is able to understand emotions, express and resonate emotions, self-adjust its speech rate, support multiple languages and dialects, have lower latency, and can be interrupted at any time, which can be experienced by users on the "Wisdom Spectrum Clear Speech" App from now on. According to the introduction, GLM-4-Voice has the following features: Emotional expression and emotional resonance: the voice has different emotions and subtle changes, such as happy, sad, angry, scared, etc. Adjusting speech speed: In the same round of conversation, you can ask TA to speak faster or slower...
Information
- 12.9k
24/10/26
Alibaba releases new voice model Qwen2-Audio, surpassing OpenAI Whisper

Recently, Alibaba launched a new open source voice model Qwen2-Audio based on its Qwen-Audio. This model not only performs well in voice recognition, translation and audio analysis, but also achieves significant improvements in functions and performance. Qwen2-Audio provides a basic version and a command fine-tuning version. Users can ask questions to the audio model through voice, and recognize and analyze the content. For example, users can ask a woman to say a paragraph, and Qwen2-Audio can determine her age or analyze her emotions; if a noisy voice is input…
Information
- 16.4k
24/8/11
Claiming to be better than XTTS! VoiceCraft: A voice model that supports voice cloning and modifying original audio text

Recently, a voice model called VoiceCraft has attracted widespread attention in the industry. According to official announcements, the performance of this model has surpassed XTTS, which undoubtedly brings new breakthroughs in the field of AI audio processing. Project address: https://github.com/jasonppy/VoiceCraft The biggest highlight of VoiceCraft is its powerful audio cloning ability. Users only need to provide a piece of original audio, and VoiceCraft can use deep learning technology to copy new audio that is extremely similar to the original audio.
Information
- 5.8k
24/3/26

❯

Checking in, please wait

Click for today's check-in bonus!

You have earned {{mission.data.mission.credit}} points today!

Check-in

Leaderboard

{{item.credit}}

Lasted{{item.count}}days

My Coupons

_￥_Coupons

Limitation of useExpired and Unavailable

Limitation of use
before

Limitation of usePermanently valid

Coupon ID:
×

Available for the following products: Available for the following products categories: Unrestricted use:

[{{ct.name}}]

Available for all products and product types

No coupons available!

Cart

×

Delete

Shopping Cart is Empty!

Empty Cart Checkout

You have a new message

No new messages

Write a new message More

{{userData.name}}Verify

Voice Model

OpenAI releases three real-time voice models

Inworld AI releases Realtime TTS-2 voice model: Perceptive user emotions, supporting 100 languages to keep the same voice

Ali Tun Yi has launched a new version of the voice model: 3 seconds to "replicate" 9 languages, 18 dialects

FACEWALL SMART RELEASE 0.5B ARGUMENT VOICE MODEL, SOUND ECHOES HUMAN

Microsoft launches its first self-developed AI model: MAI-Voice-1 generates audio in seconds, MAI-1-preview points to Copilot text scenes

OpenAI Releases New Generation of Speech Models to Enable AI Intelligents to Speak More Naturally

MiniMax Halo Speech AI Product Launched: Supports 17 Languages and Up to 10,000 Characters

Wisdom Spectrum Clear Speech Launches Emotional Speech Model GLM-4-Voice: Understanding Emotions, Emotional Expression and Empathy

Alibaba releases new voice model Qwen2-Audio, surpassing OpenAI Whisper

Claiming to be better than XTTS! VoiceCraft: A voice model that supports voice cloning and modifying original audio text

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow