{"id":10198,"date":"2024-05-13T09:10:23","date_gmt":"2024-05-13T01:10:23","guid":{"rendered":"https:\/\/www.1ai.net\/?p=10198"},"modified":"2024-05-13T09:10:23","modified_gmt":"2024-05-13T01:10:23","slug":"%e6%b6%88%e6%81%af%e7%a7%b0-openai-%e5%b0%86%e6%8e%a8%e5%87%ba%e5%a4%9a%e6%a8%a1%e6%80%81%e4%ba%ba%e5%b7%a5%e6%99%ba%e8%83%bd%e6%95%b0%e5%ad%97%e5%8a%a9%e7%90%86%ef%bc%9a%e5%8f%af%e8%af%ad%e9%9f%b3","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/10198.html","title":{"rendered":"OpenAI to launch multimodal AI digital assistant: can talk by voice, recognize objects, sources say"},"content":{"rendered":"<p data-vmark=\"f07f\">According to The Information,<a href=\"https:\/\/www.1ai.net\/en\/tag\/openai\" title=\"[View articles tagged with [OpenAI]]\" target=\"_blank\" >OpenAI<\/a> A new multimodal AI model that is capable of voice conversations and object recognition was recently demonstrated to some customers. Sources tell us that this may be one of the official releases OpenAI plans to make this coming May 13th.<\/p>\n<p data-vmark=\"1759\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-10199\" title=\"50747126-7fdb-444e-8409-7ed8a4badb31\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/05\/50747126-7fdb-444e-8409-7ed8a4badb31.jpg\" alt=\"50747126-7fdb-444e-8409-7ed8a4badb31\" width=\"940\" height=\"645\" \/><\/p>\n<p>Image source: Pexels<\/p>\n<p data-vmark=\"d766\">According to the report, the new model can process image and audio information faster and more accurately than OpenAI's existing standalone image recognition and text-to-speech models. For example, it could help customer service agents \"better understand a caller's tone of voice and determine if they are using a sarcastic tone.\" Theoretically, the model could also assist students in learning math or translating real-world sign language.<\/p>\n<p data-vmark=\"5502\">However, the source also noted that while the model was able to outperform the GPT-4 Turbo in terms of answering certain questions, there is still the possibility of confidently giving the wrong answer.<\/p>\n<p data-vmark=\"693d\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-10200\" title=\"f2254b1a-9518-4dd7-980c-10b2858667dc\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/05\/f2254b1a-9518-4dd7-980c-10b2858667dc.jpg\" alt=\"f2254b1a-9518-4dd7-980c-10b2858667dc\" width=\"1292\" height=\"264\" \/><\/p>\n<p data-vmark=\"7e0c\">Developer Ananay Arora posted a screenshot containing code related to calls, suggesting that OpenAI may be adding the ability to make phone calls to ChatGPT. Arora also found some evidence that OpenAI is configuring servers for real-time audio and video communication.<\/p>\n<p data-vmark=\"dadf\">OpenAI CEO Sam Altman has categorically denied that the upcoming release is a large-scale language model code-named GPT-5 (which is said to be significantly better than GPT-4), and The Information says that GPT-5 could be officially unveiled before the end of the year. Altman also said that OpenAI will not release a new AI search engine.<\/p>\n<p data-vmark=\"d959\">If The Information's report is true, OpenAI's new release could still have some impact on the upcoming Google I \/ O developer conference. Google is also known to be testing technology that utilizes AI to make phone calls. Additionally, Google has a rumored upcoming project codenamed \"Pixie,\" a multimodal Google Assistant replacement that recognizes objects through the device's camera, providing users with information such as \"how to get to the place of purchase\" or \"how to get to the place of purchase\" or \"how to get to the place of purchase\". \"or how to use it.<\/p>","protected":false},"excerpt":{"rendered":"<p>According to The Information, OpenAI has recently shown some clients a new multimodular artificial intelligence model that allows voice dialogue and object recognition. According to some sources, this may be one of the elements of OpenAI\u2019s forthcoming official release on May 13th. Figure source Pecels reported that the new model is able to process images and audio information more quickly and accurately than the existing OpenAI stand-alone image recognition and text-to-speech model. It can, for example, help to \u201cbetter understand the tone of callers and judge whether they are using sarcasm\u201d. Theoretically, the model also helps students to learn mathematics or translate the symbols of the real world<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[190,2567],"collection":[],"class_list":["post-10198","post","type-post","status-publish","format-standard","hentry","category-news","tag-openai","tag-2567"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/10198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=10198"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/10198\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=10198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=10198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=10198"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=10198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}