{"id":34901,"date":"2025-05-10T09:20:38","date_gmt":"2025-05-10T01:20:38","guid":{"rendered":"https:\/\/www.1ai.net\/?p=34901"},"modified":"2025-05-09T21:25:36","modified_gmt":"2025-05-09T13:25:36","slug":"latentsync%ef%bc%9a%e5%bc%80%e6%ba%90%e8%a7%86%e9%a2%91%e5%af%b9%e5%8f%a3%e5%9e%8bai%e6%a8%a1%e5%9e%8b%ef%bc%8c%e5%ad%97%e8%8a%82%e8%b7%b3%e5%8a%a8%e5%bc%80%e6%ba%90%e7%9a%84%e6%95%b0%e5%ad%97","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/34901.html","title":{"rendered":"LatentSync: open-source video lip-sync AI model, ByteDance's open-source digital human project"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-34902\" title=\"38ad39ccj00svzwcr0082d000u000hem\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/05\/38ad39ccj00svzwcr0082d000u000hem.jpg\" alt=\"38ad39ccj00svzwcr0082d000u000hem\" width=\"1080\" height=\"626\" \/><\/p>\n<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/latentsync\" title=\"[See article with [LatentSync] label]\" target=\"_blank\" >LatentSync<\/a>It is an end-to-end lip-synchronization framework jointly launched by ByteDance and Beijing Jiaotong University. It is based on audio-driven latent diffusion models (audio-driven latent diffusion models) and aims to achieve seamless temporal consistency and generate high-quality, realistic speaking videos. The framework is suitable for a wide range of application scenarios such as voice-over, virtual avatars, game development, and more.<\/p>\n<h2><span id=\"lwptoc2\"><strong>LatentSync Features<\/strong><\/span><\/h2>\n<ol>\n<li>End-to-End Lip Synchronization: Latent Sync models complex audio-video relationships directly in latent space without any intermediate motion representation. It accurately generates lip movements that match the input audio, enabling precise synchronization of lip shape with speech.<\/li>\n<li>High-resolution video generation: Latent Sync overcomes the high hardware requirements of traditional diffusion models when diffusing in pixel space, and is capable of generating high-resolution video.<\/li>\n<li>Dynamic Realistic Effect: The generated video has a dynamic realistic effect, which can capture the subtle expressions related to the emotional tone and make the character's speech more natural and vivid.<\/li>\n<li>Temporal Consistency Enhancement: Latent Sync introduces the Temporal REPresentation Alignment (TREPA) method, which extracts temporal representations through a large-scale self-supervised video model to enhance the temporal consistency between generated frames and real frames, reduce video flickering phenomenon, and make video playback smoother.<\/li>\n<li>Multi-language support: Latent Sync supports multi-language processing for international content localization.<\/li>\n<\/ol>\n<p>Official website link:<a href=\"https:\/\/www.latentsync.org\">https:\/\/www.latentsync.org<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>LatentSync is an end-to-end lip-synchronization framework jointly launched by ByteDance and Beijing Jiaotong University. It is based on audio-driven latent diffusion models and aims to achieve seamless temporal consistency and generate high-quality, realistic speaking videos. The framework is suitable for a variety of application scenarios such as voice-overs, virtual avatars, game development, and more. LatentSync Features End-to-End Lip Synchronization: Latent Sync models complex audio-video relationships directly in potential space without any intermediate motion representation. It accurately generates matching lip movements based on the input audio, enabling precise synchronization of lip shape with speech. High Resolution Video Generation: L<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[138,140,145],"tags":[170,6544,2284,1252,1896,1027],"collection":[],"class_list":["post-34901","post","type-post","status-publish","format-standard","hentry","category-product","category-qita","category-shipin","tag-ai","tag-latentsync","tag-2284","tag-1252","tag-1896","tag-1027"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/34901","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=34901"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/34901\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=34901"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=34901"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=34901"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=34901"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}