Ali Tongyi Wanphase "first and last frame born video model" open source

Ali Tongyi Wanxiang「first and last frame-generated video model"April 17 announcementOpen SourceThe model has a parameter count of 14B and is claimed to be the industry's first open source first and last frame video model with a parameter scale of 10 billion.

Industry's first, Ali Tongyi Wanphase "first and last frame born video model" open source

It can start and end pictures as specified by the user, theGenerate a 720p HD video that connects the beginning and end frames.This upgrade will meet the needs of users for more controllable and customized video generation.

Users can experience the model for free on the official website of Tongyi Wanxiang, or download the model from Github, Hugging Face, or Magic Hitch Community for local deployment and secondary development.

Technical Introduction

The first and last frame-generated video is more controllable than text-generated video and single-image-generated video, but the training of this kind of model is more difficult, and the first and last frame-generated video needs to satisfy the following points at the same time:

1. Generate video content that is consistent with the two images entered by the user

2. Ability to follow user prompted word instructions

3. The ability to make a natural and smooth transition from a given first frame to the last frame

4. The video itself is coordinated and natural

Training and reasoning optimization

Based on the existing Wan2.1 literate video base model architecture, the Tongyi Wanphase first and last frames raw video model further introduces an additional conditional control mechanism, through which smooth and accurate first and last frame transformations can be realized.

In the training phase, the team also constructed training data specifically for the first and last frame modes, and at the same time used parallel strategies for the text and video coding module and the diffusion transformation model module, which improve the efficiency of model training and generation, and also guarantee that the model has the effect of high-resolution video generation.

In the inference stage, in order to support HD video inference under the condition of limited memory resources, the first and last frame models of WanPhase adopt the model slicing strategy as well as the sequence parallelism strategy respectively, which significantly shorten the inference time under the premise of ensuring that the inference effect is not compromised.

Functionality Upgrade

Based on this model, users can complete more complex and more personalized video generation tasks, which can achieve the same subject of special effects changes, different scenes of the operation of the camera control and other video generation.

For example, upload two pictures of the same location in different time periods, enter a prompt word, Tongyi Wanxiang first and last frame generation model can generate a video of the alternating seasons or day and night changes in the effect of time-lapse photography; upload two different images of the scene, but also through the rotating, panning, advancing, and other operations of the mirror control articulation of the screen, to ensure consistency between the video and the preset picture under the premise of the same time, and at the same time, let the video have a more rich The video can also be controlled by rotating, panning, advancing and other lens controls.

1AI Attached open source address:

Github:https://github.com/Wan-Video/Wan2.1
HuggingFace:https://huggingface.co/Wan-AI/Wan2.1-FLF2V-14B-720P
Magic Match Community:https://www.modelscope.cn/models/Wan-AI/Wan2.1-FLF2V-14B-720P
Direct Experience Portal:https://tongyi.aliyun.com/wanxiang/videoCreation

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Industry's first, Ali Tongyi Wanphase "first and last frame born video model" open source

Shanghai Artificial Intelligence Laboratory open-sources multimodal large model "Shusheng Wanxiang 3.0": able to process text and multimodal inputs simultaneously

Google also wants to "send AI to campus": U.S. college students can subscribe to the Google One AI Premium program for free for a limited time.

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Shanghai Artificial Intelligence Laboratory open-sources multimodal large model "Shusheng Wanxiang 3.0": able to process text and multimodal inputs simultaneously

Google also wants to "send AI to campus": U.S. college students can subscribe to the Google One AI Premium program for free for a limited time.

Alibaba Tongyi Qianwen open-sources 32 billion parameter models and has achieved full open-source of 7 major language models

Ali Tongyi Qianwen open-sources Qwen2-Audio 7B voice interaction model: free interaction without text input

Ali Tongyi Thousand Questions open source Qwen2.5-Coder full range of models, claiming that the code ability to tie GPT-4o

Ali Tongyi Thousand Questions Open Source Visual Reasoning Model QVQ-72B-Preview: Think Like a Physicist

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow