{"id":31998,"date":"2025-03-31T20:06:59","date_gmt":"2025-03-31T12:06:59","guid":{"rendered":"https:\/\/www.1ai.net\/?p=31998"},"modified":"2025-03-31T20:06:59","modified_gmt":"2025-03-31T12:06:59","slug":"%e7%99%be%e5%ba%a6%e7%ab%af%e5%88%b0%e7%ab%af%e8%af%ad%e9%9f%b3%e8%af%ad%e8%a8%80%e5%a4%a7%e6%a8%a1%e5%9e%8b%e5%8f%91%e5%b8%83%ef%bc%8c%e6%88%90%e6%9c%ac%e5%ae%a3%e7%a7%b0%e6%9c%80%e9%ab%98%e9%99%8d-9","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/31998.html","title":{"rendered":"Baidu's End-to-End Speech-Language Big Model Released, Costs Claimed to Drop by Up to 90%"},"content":{"rendered":"<p>March 31, 2011 - In today's<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e7%99%be%e5%ba%a6\" title=\"[Sees articles containing [100 degrees] labels]\" target=\"_blank\" >Baidu<\/a> AI DAY.<strong>Baidu Releases First Cross-Attention Based<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e7%ab%af%e5%88%b0%e7%ab%af\" title=\"[View articles tagged with [end-to-end]]\" target=\"_blank\" >end-to-end<\/a><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e9%9f%b3%e8%af%ad%e8%a8%80%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with tags on [Voice Language Large Model]]\" target=\"_blank\" >phonetic language macromodel<\/a><\/strong>, announced the realization of ultra-low latency and ultra-low cost, with call costs dropping by about 50%-90% compared to industry averages in voice Q&amp;A scenarios on telephone voice channels.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-31999\" title=\"f9fcdb0ej00stzlm600bkd000v900cxp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/03\/f9fcdb0ej00stzlm600bkd000v900cxp.jpg\" alt=\"f9fcdb0ej00stzlm600bkd000v900cxp\" width=\"1125\" height=\"465\" \/><\/p>\n<p>On that day.<strong>Wen Xiaoyin Announces Brand Refresh, First to Access the Model<\/strong>It also brings upgraded functions such as multi-model fusion scheduling and picture Q&amp;A. After accessing the model, Wen Xiaoyan can not only support more simulated language chat effect, but also support Chongqing, Guangxi, Henan, Guangdong, Shandong and other special dialects. According to reports, the voice model has very low training and use costs, very fast reasoning response speed, voice interaction, can reduce the user waiting time from the industry's common 3-5 seconds to about 1 second.<\/p>\n<p><strong>The updated Wen Xiaoyan also supports \"multi-model fusion scheduling\".<\/strong>It integrates Baidu's self-developed models such as Wenshin X1 and Wenshin 4.5, and accesses third-party quality models such as DeepSeek-R1, realizing intelligent collaboration between multiple models. Users can choose \"automatic mode\" to call the optimal model combination with one click, or select a single model to complete a specific task according to demand, improving response speed and task processing capability.<\/p>\n<p>1AI learned from the event that<strong>Wen Xiaoyan has also enhanced the photo quiz feature<\/strong>The user shoots or uploads a picture and asks a question in text or voice to get an in-depth analysis directly. For example, shooting a math problem can generate real-time solutions and video analysis; uploading multiple product images can compare parameters and prices to assist shopping decisions.<\/p>\n<p>In addition, Wen Xiaoyan added \"<strong>Try a cold one.<\/strong>With the function of \"History Scholar\", users can preset \"history scholar\", \"science and technology expert\" and other personalized perspectives to give a multi-dimensional interpretation of the same picture. For example, when the user asks \"Cat Window Mystery, why do cats love the scientific truths around the window?\" Wen Xiaoyan can give a unique interpretation from the perspectives of hunting instincts, energy acquisition, territorial awareness, etc.<\/p>\n<p>Jia Lei, chief architect of Baidu Speech, reveals that the model is the first in the industry to introduce an end-to-end speech-language grand model based on the new Cross-Attention. \"<strong>In voice scenarios meeting certain interaction metrics, the cost of large model calls is lower than the industry average 50%-90%<\/strong>In addition, the inference response speed is extremely fast, compressing the waiting time of voice interaction to about 1 second, greatly improving the interaction fluency. At the same time, with the support of the big model, streaming word-by-word LLM-driven multi-emotional speech synthesis is realized, with full, realistic and anthropomorphic emotions, and the interactive listening sense is greatly improved.\"<\/p>","protected":false},"excerpt":{"rendered":"<p>March 31 news, in today's Baidu AI DAY, Baidu released the first end-to-end speech language big model based on the new mutual attention (Cross-Attention), announced the realization of ultra-low latency and ultra-low cost, in the voice Q&amp;A scenario of the telephone voice channel, the cost of invocation decreased by about 50%-90% compared with the average value of the industry. On the same day, the Wen XiaoYin On the same day, WenxiaoYin announced a brand refresh and took the lead in accessing the model, as well as bringing upgrades to functions such as multi-model fusion scheduling and picture Q&amp;A. After accessing the model, Wen Xiaoyan can not only support a more realistic language chat effect, but also support Chongqing, Guangxi, Henan, Guangdong, Shandong and other special dialects. According to reports, the voice big model has extremely low training and use costs, extremely fast reasoning response speed, voice interaction, can be the user waiting time<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[234,6126,6127],"collection":[],"class_list":["post-31998","post","type-post","status-publish","format-standard","hentry","category-news","tag-234","tag-6126","tag-6127"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/31998","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=31998"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/31998\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=31998"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=31998"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=31998"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=31998"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}