On December 19, according to a tweet from the "LongCat" publicMeituan (Japanese company) The LongCat team officially released andOpen Source SOTA LEVELVirtual HumanVideo Generation Model — LongCat-Video-Avatar。
Based on the LongCat-Video base, the model is based on the continuation of the core design of a “one model for multitasking”, with originals supporting core functions such as Audio-Text-to-Video, Audio-Text-Image-to-Video, and video continuation, while at the same time fully upgrading the bottom structure to achieve a breakthrough in the three dimensions of operational integrity, long video stability and identity consistency。

According to the official presentation, the model has the following technical highlights。
"Goodbye, stiffness, for life":It's not just the mouth, it's the mouthSynchronize eye, face and body movementsTo achieve a rich emotional expression。
Even when you don't talk, you look like a manThe group uses Disentangled Unconditional Guidance to train modelsI understand "quiet" doesn't mean "dead machine."I don't know. When you talk, you'll be humanBlinking, repositioning, relaxing.
LongCat-Video-Avatar was described as the first “all-power player” to support three modes of production of text, pictures and videos, and the virtual person had “real life power”。
Quantitative assessments on authoritative public data sets such as HDTF, CelebV-HQ, EEMTD and EvalTalker indicate that LongCat-Video-Avatar has achieved SOTA lead on a number of core indicators。
1 Project address:
- GitHub: https://github.com/meituan-longcat/LongCat-Video
- Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Video-Avatar
- Project: https://meigen-ai.github.io/LongCat-Video-Avatar/