OpenAI Launches gpt-realtime Speech Dialogue Model: Supports Emotion Awareness, Seamless Multilingual Switching

August 29th.OpenAI Has formalized its "Realtime API" into the production environment, moving it out of Beta.

OpenAI Launches gpt-realtime Speech Dialogue Model: Supporting Emotion Awareness, Seamless Multilingual Switching

According to 1AI, this API is aimed at enterprises and developers to help them develop voice assistants for real-world scenarios, covering areas such as customer support, education, and personal efficiency improvement. Its core component, the "gpt-realtime" model, adopts an end-to-end Speech-to-Speech architecture, which can directly generate and process speech, eliminating the need for regular text conversion steps. According to OpenAI, the model is more responsive and natural sounding than its predecessor, and is more capable of handling complex commands.

OpenAI says the gpt-realtime model can now capture non-verbal signals such as laughter, switch languages mid-conversation, and adjust the tone of voice - for example, to achieve a "friendly tone with a French accent" or a "faster, more professional tone". "a friendly tone with a French accent" or "a professional tone at a faster pace", for example. In addition, the model adds the voices "Cedar" and "Marin" and optimizes eight existing voice effects.

In performance benchmarks, the gpt-realtime model shows significant improvements: accuracy from 65.6% to 82.8% in the Big Bench Audio benchmark, from 20.6% to 30.5% in the MultiChallenge benchmark, and from 49.7% to 66.5% in the ComplexFuncBench benchmark. ComplexFuncBench benchmark from 49.7% to 66.5%.

The API update optimizes the tool integration process, and OpenAI says the model improves the reliability of function calls by more accurately selecting the right tool, triggering the tool at the right time, and configuring the tool parameters correctly. Developers can connect to external tools and services through Session Initiation Protocol (SIP) and Remote Media Control Protocol (MCP) servers. Meanwhile, the reusable cue word function supports saving the configuration and tool settings under different usage scenarios, further enhancing development efficiency.

The API now supports image input. Users can send screenshots or photos during a conversation, and the model can interact with the content of the image -- for example, reading text from the image or answering questions related to the image content. Developers can control the range of images that the model can access.

In addition, the API adds two new useful features: developers can set a token usage limit and streamline the content of multi-round conversations. These two features help to better control costs in longer conversations. In terms of pricing, the cost of using the gpt-realtime model has been reduced by 20%, with current pricing at $32 per million for audio input tokens (IT Home note: current exchange rate is about 229 RMB), $64 per million for audio output tokens (current exchange rate is about 457.9 RMB), and $0.40 per million for cache input tokens (current exchange rate is about 2.5 RMB). USD 32 per million (IT Home note: current exchange rate is about RMB 229), audio output token USD 64 per million (current exchange rate is about RMB 457.9), cache input token USD 0.40 per million (current exchange rate is about RMB 2.9).

OpenAI says the API has the ability to detect problematic content and automatically terminate a conversation if it violates the platform's policies. However, judging from the evolution of security in language modeling, this should not be the only means of security, and developers still need to add their own proprietary security requirements.

For EU users, the API offers data localization storage options and special privacy rules for business users to comply with data protection regulations in the EU region.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

OpenAI Launches gpt-realtime Speech Dialogue Model: Supporting Emotion Awareness, Seamless Multilingual Switching

Microsoft launches its first self-developed AI model: MAI-Voice-1 generates audio in seconds, MAI-1-preview points to Copilot text scenes

Xiaomi AI glasses new function of internal test user recruitment: new Alipay "look at it" payment, support Xiaoxia memo

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Microsoft launches its first self-developed AI model: MAI-Voice-1 generates audio in seconds, MAI-1-preview points to Copilot text scenes

Xiaomi AI glasses new function of internal test user recruitment: new Alipay "look at it" payment, support Xiaoxia memo

Some netizens exposed the invitation email sent by OpenAI to red team testers: GPT-5 has started red team testing

Google Gemini pushes back on OpenAI, ChatGPT AI to upgrade Moonshine global memory

OpenAI to open source new model?

OpenAI: ChatGPT on track to reach 700 million weekly active users this week, up more than 4x from last year

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow