{"id":34903,"date":"2025-05-11T09:26:21","date_gmt":"2025-05-11T01:26:21","guid":{"rendered":"https:\/\/www.1ai.net\/?p=34903"},"modified":"2025-05-09T21:32:23","modified_gmt":"2025-05-09T13:32:23","slug":"sonic%ef%bc%9a%e9%9d%99%e6%80%81%e5%9b%be%e7%94%9f%e6%88%90%e5%8a%a8%e6%80%81%e8%a7%86%e9%a2%91%ef%bc%8c%e8%85%be%e8%ae%af%e5%bc%80%e6%ba%90%e5%9b%be%e7%89%87%e5%94%b1%e6%ad%8c%e8%af%b4%e8%af%9dai","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/34903.html","title":{"rendered":"Sonic: static pictures to generate dynamic video, Tencent open source picture singing and talking AI digital person project"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-34904\" title=\"8fa1b542j00svzwcr004zd000u000bam\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/05\/8fa1b542j00svzwcr004zd000u000bam.jpg\" alt=\"8fa1b542j00svzwcr004zd000u000bam\" width=\"1080\" height=\"406\" \/><\/p>\n<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/sonic\" title=\"_Other Organiser\" target=\"_blank\" >Sonic<\/a>Sonic is an audio-driven portrait animation framework from Tencent and Zhejiang University that generates realistic facial expressions and movements based on global audio perception.Sonic is based on context-enhanced audio learning and motion decoupling controllers, which extract long-term temporal audio knowledge within an audio clip and independently control the head and expression movements, respectively, to enhance local audio perception.Sonic uses a temporal-aware positional offset fusion mechanism to extend local audio perception to the global level, solving the problem of jitter and mutation in long video generation. Sonic uses a time-aware positional offset fusion mechanism to extend local audio perception to the global level, solving the problem of jitter and mutation in long video generation.Sonic outperforms existing state-of-the-art methods in terms of video quality, lip-synchronization accuracy, motion diversity, and temporal coherence, and dramatically improves the naturalness and coherence of portrait animations, supporting fine-grained adjustments to the animations by the user.<\/p>\n<h2><span id=\"lwptoc2\"><strong>Sonic Features<\/strong><\/span><\/h2>\n<ol>\n<li>Realistic Lip Synchronization: Precise alignment of audio with lip movements ensures a high degree of consistency between what is spoken and the shape of the mouth.<\/li>\n<li>Rich expressions and head movements: Generate diverse and natural facial expressions and head movements for more vivid and expressive animations.<\/li>\n<li>Stable generation over long periods of time: When processing long videos, it can maintain a stable output, avoid jitter and sudden changes, and ensure overall coherence.<\/li>\n<li>User adjustability: Supports user control of head movement, expression intensity and lip synchronization effects based on parameter adjustments, providing a high degree of customizability.<\/li>\n<\/ol>\n<p>Official website link:<a href=\"https:\/\/github.com\/jixiaozhong\/Sonic\">https:\/\/github.com\/jixiaozhong\/Sonic\u00a0<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>Sonic is an audio-driven portrait animation framework from Tencent and Zhejiang University that generates realistic facial expressions and movements based on global audio perception.Sonic is based on context-enhanced audio learning and motion decoupling controllers, which extract long-term temporal audio knowledge within the audio clip and independently control the head and expression movements, respectively, to enhance local audio perception.Sonic uses a temporal-aware positional offset fusion mechanism to extend local audio perception to the global, addressing jitter and mutation in long video generation.Sonic outperforms existing state-of-the-art approaches in video quality, lip-synchronization accuracy, motion diversity, and temporal coherence, dramatically improving the naturalness and coherence of portrait animations and supporting fine-grained user adjustments to the animations. Sonic Features Realistic Lip Synchronization: Accurately synchronizes audio with the<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[138,140,145],"tags":[2481,165,6545,219],"collection":[],"class_list":{"0":"post-34903","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"hentry","6":"category-product","7":"category-qita","8":"category-shipin","9":"tag-ai","11":"tag-sonic","12":"tag-219"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/34903","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=34903"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/34903\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=34903"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=34903"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=34903"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=34903"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}