{"id":18684,"date":"2024-08-27T09:00:51","date_gmt":"2024-08-27T01:00:51","guid":{"rendered":"https:\/\/www.1ai.net\/?p=18684"},"modified":"2024-08-27T09:00:51","modified_gmt":"2024-08-27T01:00:51","slug":"%e4%ba%91%e7%9f%a5%e5%a3%b0%e6%8e%a8%e5%87%ba%e5%b1%b1%e6%b5%b7%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b%ef%bc%9a%e5%ae%9e%e6%97%b6%e7%94%9f%e6%88%90%e6%96%87%e6%9c%ac%e3%80%81%e9%9f%b3","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/18684.html","title":{"rendered":"Unisound launches mountain and sea multimodal model: real-time generation of text, audio and images"},"content":{"rendered":"<p>Well-known companies in the field of artificial intelligence in China<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e4%ba%91%e7%9f%a5%e5%a3%b0\" title=\"[Sees the article with the tag]\" target=\"_blank\" >Cloud Voice<\/a>, announced the launch of its latest research and development achievement - Shanhai in Beijing on August 23, 2024<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [Multimodal Large Model] labels]\" target=\"_blank\" >Multimodal large model<\/a>.<\/p>\n<p>By integrating cross-modal information, the Shanhai multimodal large model can receive multiple forms such as text, audio, and images as input, and generate any combination of text, audio, and image outputs in real time.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18685\" title=\"7936ba0fj00siuqrg00gpd000lx00rfp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/7936ba0fj00siuqrg00gpd000lx00rfp.jpg\" alt=\"7936ba0fj00siuqrg00gpd000lx00rfp\" width=\"789\" height=\"987\" \/><\/p>\n<p>The large multimodal model has the following characteristics:<\/p>\n<ul>\n<li><strong>Real-time reply, free interruption:<\/strong>The response time is similar to that of humans in real conversations; the conversation can be interrupted at any time, and users can interrupt the conversation at will.<\/li>\n<li><strong>Feeling and expressing emotions:<\/strong>Judging user emotions through voice text, it can also capture subtle changes in the tone, rhythm and pitch of the user&#039;s voice to perceive the other party&#039;s emotional state<\/li>\n<li><strong>Free switching of tones:<\/strong>Freely switch timbres according to the user&#039;s personalized needs; learn the user&#039;s timbre and style, and replicate the user&#039;s voice<\/li>\n<li><strong>Visual Scene Understanding:<\/strong>&quot;See&quot; the surrounding environment and combine images and text to provide easy-to-understand summaries<\/li>\n<li><strong>Image generation, building personalized art:<\/strong>Create visual content based on user instructions and provide customized images that meet individual needs<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>Yun Zhisheng, a well-known company in the field of artificial intelligence in China, announced the launch of its latest research and development achievement, the Shanhai Multimodal Grand Model, on August 23, 2024 in Beijing. By integrating cross-modal information, the Shanhai Multimodal Big Model is able to receive text, audio, images, and other forms as input, and generate any combination of text, audio, and image output in real time. The Hai Multimodal Big Model has the following features: Real-time second response, free interjections: similar to the response time of human beings in real conversations; support for interrupting conversations at any time, and the user can interject at any time during the conversation Sensing emotions, expressing emotions: judge the user's emotions through the speech text, and also capture subtle changes in the user's speech such as the tone of voice, tempo, and intonation, to sense the other party's emotional state Tone switching: according to the user's personalized needs, the tone can be freely switched.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1575,602],"collection":[],"class_list":["post-18684","post","type-post","status-publish","format-standard","hentry","category-news","tag-1575","tag-602"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18684","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=18684"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18684\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=18684"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=18684"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=18684"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=18684"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}