October 27th message, this morningMeituan (Japanese company) LongCat team release andOpen Source LongCat-Video Video Generation ModelI DON'T KNOW. ACCORDING TO OFFICIAL PRESENTATIONS, IT ACHIEVES OPEN-SOURCE SOTA (THE MOST ADVANCED LEVEL) WITH A UNIFIED MODEL FOR THE ENGINEERING AND GRAPHIC VIDEO BASE MISSION AND IS BASED ON PRE-TRAINING IN ORIGINAL VIDEO-RENEWAL TASKSMinute long video coherent generation,SafeguardConsistency of time series and soundness of physical motion across framesThere are significant advantages in the area of long video generation。

According to the presentation, in recent years, the World Model has allowed artificial intelligence to truly understand, predict and even reconstruct the real world, thus being seen as the core engine leading to the next generation of intelligence. As an intelligent system capable of modelling physical patterns, temporal evolution and scene logic, the “world model” gives artificial intelligence the ability to “see” the nature of the world's operations. And the video generation modelIt's about to become a key path to building a world model:: Compression of many forms of knowledge, such as geometry, semantics and physics, through video-generated missions, enabling artificial intelligence to simulate, push and even preview the operation of the real world in digital space。
As a base for multifunctional unified video generation based on Diffusion Transformer (DiT) architecture, LongCat-Video innovation achieves task segregation by "number of condition frames" - Vincent video does not require frame, frame reference 1 graphic video input, video continuation is based on multi-spectrum pre-sequenced content, birth supports three core tasks and does not require additional model adaptation to form a "manual / graphic / video continuation" full mission closed loop。
- Vincent video: generated 720p, 30fps High-resolution video, yesExactly decipher the details of objects, characters, scenes, styles in text, SEMANTIC UNDERSTANDING AND VISUAL PRESENTATION UP TO THE OPEN SOURCE SOTA LEVEL。
- Graphic video: Strictly preserves the main attributes, background relationships and overall style of the reference image, dynamic processes conform to physical laws, supports multiple types of input, such as detailed instructions, concise descriptions, empty commands, and excellent content consistency and dynamic naturality。
- Video continuation: Video continuation is the core differentialization capability of LongCat Video, which provides primary technical support for long video generation based on multiple frame conditions。
Based on video-renewal mission pre-training, Block-Causual Attention mechanism and GRPO post-training, LongCat-Video can steadily export 5-minute long video with no loss of quality, known as industry “top” level。
1AI attaches the relevant links below:
- GitHub: https://github.com/metuan-longcat/LongCat-Video
- Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Video
- Project Page: https://meituan-longcat.github.io/LongCat-Video/