LatentSync: open-source video lip-sync AI model, ByteDance's open-source digital human project

LatentSync: open-source video lip-sync AI model, ByteDance's open-source digital human project

LatentSyncIt is an end-to-end lip-synchronization framework jointly launched by ByteDance and Beijing Jiaotong University. It is based on audio-driven latent diffusion models (audio-driven latent diffusion models) and aims to achieve seamless temporal consistency and generate high-quality, realistic speaking videos. The framework is suitable for a wide range of application scenarios such as voice-over, virtual avatars, game development, and more.

LatentSync Features

  1. End-to-End Lip Synchronization: Latent Sync models complex audio-video relationships directly in latent space without any intermediate motion representation. It accurately generates lip movements that match the input audio, enabling precise synchronization of lip shape with speech.
  2. High-resolution video generation: Latent Sync overcomes the high hardware requirements of traditional diffusion models when diffusing in pixel space, and is capable of generating high-resolution video.
  3. Dynamic Realistic Effect: The generated video has a dynamic realistic effect, which can capture the subtle expressions related to the emotional tone and make the character's speech more natural and vivid.
  4. Temporal Consistency Enhancement: Latent Sync introduces the Temporal REPresentation Alignment (TREPA) method, which extracts temporal representations through a large-scale self-supervised video model to enhance the temporal consistency between generated frames and real frames, reduce video flickering phenomenon, and make video playback smoother.
  5. Multi-language support: Latent Sync supports multi-language processing for international content localization.

Official website link:https://www.latentsync.org

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
producttext

ContentAny: AI content analytics platform that provides AI detection, de-tracing, traffic prediction and multi-platform content effect enhancement

2025-5-10 9:01:29

productCode

EchoComet: an AI-assisted coding tool that greatly simplifies the AI code workflow

2025-5-11 9:07:08

Search