{"id":5087,"date":"2024-03-08T09:31:09","date_gmt":"2024-03-08T01:31:09","guid":{"rendered":"https:\/\/www.1ai.net\/?p=5087"},"modified":"2024-03-08T09:31:09","modified_gmt":"2024-03-08T01:31:09","slug":"%e9%98%bf%e9%87%8c%e5%b7%b4%e5%b7%b4%e6%8e%a8%e5%87%ba-atomovideo-%e9%ab%98%e4%bf%9d%e7%9c%9f%e5%9b%be%e7%94%9f%e8%a7%86%e9%a2%91%e6%a1%86%e6%9e%b6%ef%bc%8c%e5%85%bc%e5%ae%b9%e5%a4%9a%e7%a7%8d","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/5087.html","title":{"rendered":"Alibaba launches AtomoVideo high-fidelity image-generated video framework, compatible with multiple image-generated models"},"content":{"rendered":"<p data-vmark=\"76c5\"><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e9%98%bf%e9%87%8c%e5%b7%b4%e5%b7%b4\" title=\"[Sees articles with [Aribaba] label]\" target=\"_blank\" >Alibaba<\/a>The research team recently launched <a href=\"https:\/\/www.1ai.net\/en\/tag\/atomovideo\" title=\"_Other Organiser\" target=\"_blank\" >AtomoVideo<\/a> High-fidelity image-to-video (I2V) framework, aimed at<span class=\"accentTextColor\">Generate high-quality video content from static images<\/span>, and is compatible with various T2I models.<\/p>\n<p data-vmark=\"f17b\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-5088\" title=\"5e42f446-9d71-4cce-8901-85c3f9859e6d\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/03\/5e42f446-9d71-4cce-8901-85c3f9859e6d.png\" alt=\"5e42f446-9d71-4cce-8901-85c3f9859e6d\" width=\"1440\" height=\"807\" \/><\/p>\n<p data-vmark=\"76c5\">\u25b2 Image source: AtomoVIdeo team paper<\/p>\n<p data-vmark=\"b0e4\">AtomoVideo features include:<\/p>\n<ul class=\"list-paddingleft-2\">\n<li>\n<p data-vmark=\"9f6d\">High fidelity: The generated video is highly consistent with the input image in terms of details and style<\/p>\n<\/li>\n<li>\n<p data-vmark=\"0181\">Motion consistency: The video moves smoothly, ensuring temporal consistency without abrupt jumps<\/p>\n<\/li>\n<li>\n<p data-vmark=\"ae26\">Video frame prediction: Supports the generation of long video sequences by iteratively predicting subsequent frames<\/p>\n<\/li>\n<li>\n<p data-vmark=\"4a39\">Compatibility: Compatible with existing T2I models<\/p>\n<\/li>\n<li>\n<p data-vmark=\"7033\">High semantic controllability: Ability to generate customized video content based on user&#039;s specific needs<\/p>\n<\/li>\n<\/ul>\n<p data-vmark=\"f2f7\">AtomoVideo uses the pre-trained T2I model as the basis, and adds a new one-dimensional spatiotemporal convolution and attention module after each spatial convolution layer and attention layer. The parameters of the T2I model are fixed, and only the added spatiotemporal layer is trained. Since the input concatenated image information is only encoded by VAE, it represents low-level information, which helps<span class=\"accentTextColor\">Enhance the fidelity of the video relative to the input image.<\/span>At the same time, the team also injected high-level image semantics in the form of Cross-Attention.<span class=\"accentTextColor\">Achieve higher image semantic controllability.<\/span><\/p>\n<p data-vmark=\"d5af\">Currently, the team has only released AtomoVideo&#039;s papers and demonstration videos, and has not provided an online experience address. At the same time, the official GitHub account has been opened, but it is only used for official website hosting and no relevant code has been uploaded.<\/p>","protected":false},"excerpt":{"rendered":"<p>Alibaba's research team recently launched AtomoVideo, a high-fidelity Image to Video (I2V) framework that aims to generate high-quality video content from still images and is compatible with various text-to-instrument (T2I) models. \u25b2 Source AtomoVIdeo Team Paper AtomoVideo features the following: High fidelity: the generated video is highly consistent with the input image in terms of detail and style Motion consistency: the video motion is smooth and ensures temporal consistency without abrupt jumps Video frame prediction: supports the generation of long video sequences by iteratively predicting the succeeding frames Compatibility: it works with the existing Compatibility: Compatible with existing T2I models High semantic controllability: Able to generate long video sequences according to the user's specific needs.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1550,1245,390],"collection":[],"class_list":["post-5087","post","type-post","status-publish","format-standard","hentry","category-news","tag-atomovideo","tag-1245","tag-390"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/5087","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=5087"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/5087\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=5087"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=5087"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=5087"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=5087"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}