Amazon launches new AI voice model Nova Sonic, calls out OpenAI and Google

April 9 News.AmazonThe release was called Nova Sonic 's next-generation generative AI model, which is capable of processing speech natively and generating natural and smooth speech. According to Amazon's claims, Nova Sonic benchmarked on key metrics such as speed, speech recognition, and conversation quality, thePerformance is comparable to OpenAI and Google's cutting-edge speech models.

Amazon launches new AI voice model Nova Sonic, calling out OpenAI and Google

The introduction of Nova Sonic is Amazon's strong response to emerging AI speech models, such as those providing support for ChatGPT speech patterns, and these new models are more natural when it comes to voice interactions compared to more stereotypical models such as Amazon's earlier Alexa.

Nova Sonic is available through Amazon's Bedrock Developer Platform, a tool for building enterprise-grade AI applications, and Nova Sonic is accessed through a new two-way streaming API. In a press release, Amazon called Nova Sonic "the most cost-effective" AI voice model on the market.Its price is about 80% cheaper than OpenAI's GPT-4o.

According to Rohit Prasad, Senior Vice President and Chief Scientist of Amazon's Artificial General Intelligence (AGI) division.Some of the components of the Nova Sonic already power Amazon's upgraded digital voice assistant, Alexa+.

Prasad said Nova Sonic excels at routing user requests to different APIs compared to competitors' AI speech models. This capability allows Nova Sonic to know when it needs to get real-time information from the Internet, parse proprietary data sources, or take action in an external application, and use the right tools to accomplish the task.

In two-way conversations, Nova Sonic waits for the "right moment" to speak, taking into account the speaker's pauses and interruptions. In addition, Nova Sonic generates text recordings of the user's speech, which developers can use in a variety of application scenarios.

According to Prasad, Nova Sonic makes fewer speech recognition errors than other AI speech models, meaning the model is relatively good at understanding a user's intent even when the user grunts, misspeaks, or is in a noisy environment. In Multilingual LibriSpeech, a speech recognition benchmark test that measures speech recognition across languages and dialects, theAmazon says Nova Sonic has an average Word Error Rate (WER) of just 4.2% in English, French, Italian, German, and SpanishIn other words, in those languages, the model's results were different from those of the manual transcription. That is, in these languages, the model differs from the results of manual transcription in about 4 out of every 100 words.

1AI notes that in another benchmark test measuring high-volume interactions with multiple participants -- Enhanced Multi-Party Interaction -- Amazon says Nova Sonic outperforms OpenAI's GPT-4o-transcribe model by 46.71 TP3T in word error rate accuracy.Nova Sonic also boasts industry-leading speed, with an average perceived latency of 1.09 seconds, Amazon said. This is faster than the GPT-4o model powering OpenAI's real-time API, which has a response time of 1.18 seconds, based on benchmarking results from manual analysis.

Prasad said Nova Sonic is part of Amazon's broader strategy to build Artificial General Intelligence (AGI), which the company defines as "AI systems that can do everything a human can do on a computer." Looking ahead, Prasad said Amazon plans to introduce more AI models that can understand different modalities, including image, video, and speech, as well as "other sensory data that's relevant when bringing things into the physical world."

Amazon's AGI division, headed by Prasad, seems to be playing an increasingly important role in the company's product strategy these days. Just last week, Amazon launched a preview of Nova Act, an AI model that uses the browser and appears to support elements of Alexa+ and Amazon's "Buy for Me" feature. Prasad said that starting with Nova Sonic, the company wants to make more of its internal AI models available to developers to help them build apps.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Amazon launches new AI voice model Nova Sonic, calling out OpenAI and Google

Pew Report: U.S. Public Has Negative Attitudes Toward AI

Samsung enters the AI robotics space with Ballie's public debut this week

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Pew Report: U.S. Public Has Negative Attitudes Toward AI

Samsung enters the AI robotics space with Ballie's public debut this week

Amazon's Alexa division faces layoffs as the company focuses on developing new forms of AI

Selling big models in China: Amazon held a generative AI communication meeting and reportedly received a large number of cooperation requests from Chinese companies

Amazon launches AI shopping assistant Rufus that can answer customers' questions

Amazon Cloud launches Amazon Q Apps: allowing users to build their own generative AI applications

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow