{"id":31216,"date":"2025-03-21T20:38:27","date_gmt":"2025-03-21T12:38:27","guid":{"rendered":"https:\/\/www.1ai.net\/?p=31216"},"modified":"2025-03-21T20:38:27","modified_gmt":"2025-03-21T12:38:27","slug":"openai-%e5%8f%91%e5%b8%83%e6%96%b0%e4%b8%80%e4%bb%a3%e8%af%ad%e9%9f%b3%e6%a8%a1%e5%9e%8b%ef%bc%8c%e8%ae%a9-ai-%e6%99%ba%e8%83%bd%e4%bd%93%e8%af%ad%e9%9f%b3%e8%a1%a8%e8%be%be%e6%9b%b4%e8%87%aa%e7%84%b6","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/31216.html","title":{"rendered":"OpenAI Releases New Generation of Speech Models to Enable AI Intelligents to Speak More Naturally"},"content":{"rendered":"<p>March 21st.<a href=\"https:\/\/www.1ai.net\/en\/tag\/openai\" title=\"[View articles tagged with [OpenAI]]\" target=\"_blank\" >OpenAI<\/a> In a blog post yesterday (March 20), the company announced the launch of speech-to-text (speech-to-text) and text-to-speech (text-to-speech) models to enhance speech processing capabilities.<strong>Support developers to build more accurate and customizable voice interaction systems to further promote the commercial application of AI voice technology.<\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-31217\" title=\"cd2fcaa7j00sth4er00f7d000v900hkp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/03\/cd2fcaa7j00sth4er00f7d000v900hkp.jpg\" alt=\"cd2fcaa7j00sth4er00f7d000v900hkp\" width=\"1125\" height=\"632\" \/><\/p>\n<p>In terms of speech-to-text models, OpenAI has mainly launched two models, gpt-4o-transcribe and gpt-4o-mini-transcribe, which are officially said to outperform the existing Whisper series in terms of Word Error Rate (WER), language recognition and accuracy.<\/p>\n<p>These two models support more than 100 languages and are mainly trained by reinforcement learning and diverse high-quality audio datasets, which can capture subtle speech features and reduce misrecognition, especially in noisy environments, accents, and different speech speeds for more stable performance.<\/p>\n<p>In text-to-speech, OpenAI's newest model, gpt-4o-mini-tts, allows developers to control voice style through commands such as \"simulate patient customer service\" or \"vivid storytelling,\" which can be applied to customer service (synthesizing more empathetic voices to improve user experience) and creative content (personalizing voices for audiobooks or game characters). This can be applied to customer service (synthesizing more empathetic voices to improve user experience) and creative content (designing personalized voices for audiobooks or game characters).<\/p>\n<p>Citing the introduction to the blog post, 1AI attached the three model costs below:<\/p>\n<ul>\n<li>gpt-4o-transcribe: $6 per million tokens for audio input, $2.50 per million tokens for text input, and $10 per million tokens for output, at a cost of 0.6 cents per minute.<\/li>\n<li>gpt-4o-mini-transcribe: $3 per 1 million tokens for audio input, $1.25 per 1 million tokens for text input, and $5 per 1 million tokens for output, at a cost of 0.3 cents per minute.<\/li>\n<li>gpt-4o-mini-tts: $0.60 per 1,000,000 tokens input, $12 per 1,000,000 tokens output, 1.5 cents per minute cost.<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>On March 21st, OpenAI yesterday, 20 March, published a blog announcing the launch of voice-to-text and text-to-speech models to enhance voice processing capabilities and support developers in building more accurate and customized voice-interaction systems to further the commercialization of artificial smart voice technology. On the voice-to-text model, OpenAI introduced two models, gpt-4o-transcribe and gpt-4o-mini-transcribe, which officially indicate that they exceed the existing Whisper series in terms of word error rates, language recognition and accuracy. These two models support over one<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[190,1875],"collection":[],"class_list":["post-31216","post","type-post","status-publish","format-standard","hentry","category-news","tag-openai","tag-1875"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/31216","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=31216"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/31216\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=31216"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=31216"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=31216"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=31216"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}