{"id":18608,"date":"2024-08-25T09:23:56","date_gmt":"2024-08-25T01:23:56","guid":{"rendered":"https:\/\/www.1ai.net\/?p=18608"},"modified":"2024-08-25T09:23:56","modified_gmt":"2024-08-25T01:23:56","slug":"%e5%9b%bd%e5%86%85%e9%a6%96%e4%b8%aa%e8%83%bd%e5%8a%9b%e8%bf%bd%e9%bd%90-gpt-4o-%e8%af%ad%e9%9f%b3%e8%83%bd%e5%8a%9b%e7%9a%84%e6%a8%a1%e5%9e%8b%ef%bc%8c%e5%bf%83%e8%be%b0-lingo","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/18608.html","title":{"rendered":"&quot;The first model in China with voice capabilities comparable to GPT-4o&quot;, Lingo voice AI model opens for internal testing"},"content":{"rendered":"<p>West Lake invested by Jinke Tomcat<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bf%83%e8%be%b0\" title=\"[Sees articles with [heart] labels]\" target=\"_blank\" >Xinchen<\/a>In August this year, Xinchen was launched <a href=\"https:\/\/www.1ai.net\/en\/tag\/lingo\" title=\"[See articles with [Lingo] labels]\" target=\"_blank\" >Lingo<\/a> <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%af%ad%e9%9f%b3%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [Voice Megamodel]]\" target=\"_blank\" >Voice big model<\/a>, the first end-to-end speech model in China, was launched today (August 24)<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%86%85%e6%b5%8b\" title=\"_Other Organiser\" target=\"_blank\" >Closed beta<\/a>reserve.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18609\" title=\"ac99bf53j00sir2ig002dd000j40081m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/ac99bf53j00sir2ig002dd000j40081m.jpg\" alt=\"ac99bf53j00sir2ig002dd000j40081m\" width=\"688\" height=\"289\" \/><\/p>\n<p>In the announcement released on August 21, the official introduction stated that compared with traditional TTS, the end-to-end speech big model is a more comprehensive technology. It not only can recognize speech, but also integrates natural language processing, intent recognition, dialogue management, and speech synthesis. It realizes the complete interactive process from speech input to speech feedback, greatly enriching the depth and breadth of human-computer interaction.<\/p>\n<p><strong>Xinchen Lingo voice model is the first model in China that has the same voice capabilities as GPT-4o<\/strong>, the technical capabilities have the following three notable characteristics:<\/p>\n<ul>\n<li><strong>Native speech understanding:<\/strong>As an end-to-end model, Lingo can not only recognize text information in speech, but also accurately capture other important features such as emotion, tone, pitch, and even ambient sound, helping the model to understand the speech content more comprehensively, thereby providing a more natural and vivid interactive experience.<\/li>\n<li><strong>Multiple voice styles:<\/strong>Lingo can adaptively adjust the speed, pitch, and noise intensity of speech according to the context and user instructions, and can generate voice responses in a variety of styles such as conversation, singing, and crosstalk, effectively improving the flexibility and adaptability of the model in different application scenarios.<\/li>\n<li><strong>Voice Mode Super Compression:<\/strong>Lingo uses a voice codec with a compression rate of hundreds of times, which can compress voice to an extremely short length, significantly reducing computing and storage costs while helping the model generate high-quality voice content.<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Xihu Xinchen, invested by Jinke Tomcat, launched Xinchen Lingo speech grand model in August this year, which is the first end-to-end speech grand model in China, and has opened the internal test reservation today (August 24th). In the announcement released on August 21, the official introduction said that compared with the traditional TTS, the end-to-end speech model is a more comprehensive technology, which not only allows speech recognition, but also integrates natural language processing, intent recognition, dialogue management and speech synthesis, realizing the complete interaction process from speech input to speech feedback, which greatly enriches the depth and breadth of human-computer interaction. Centron's Lingo speech model is the first model in China that can match the capability of GPT-4o speech, and has the following three significant features in terms of technical capabilities:<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[4130,1863,4129,4061],"collection":[],"class_list":["post-18608","post","type-post","status-publish","format-standard","hentry","category-news","tag-lingo","tag-1863","tag-4129","tag-4061"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18608","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=18608"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18608\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=18608"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=18608"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=18608"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=18608"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}