{"id":8656,"date":"2024-04-22T09:16:01","date_gmt":"2024-04-22T01:16:01","guid":{"rendered":"https:\/\/www.1ai.net\/?p=8656"},"modified":"2024-04-22T09:17:18","modified_gmt":"2024-04-22T01:17:18","slug":"%e5%be%ae%e8%bd%af%e6%8e%a8%e5%87%ba-vasa-1-ai-%e6%a1%86%e6%9e%b6%ef%bc%8c%e5%8f%af%e5%8d%b3%e6%97%b6%e7%94%9f%e6%88%90-512x512-40fps-%e9%80%bc%e7%9c%9f%e5%af%b9%e5%8f%a3%e5%9e%8b%e4%ba%ba%e5%83%8f","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/8656.html","title":{"rendered":"Microsoft launches VASA-1 AI framework that can convert photos into videos, realistic lip-sync portrait videos"},"content":{"rendered":"<p data-vmark=\"a713\">according to<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%be%ae%e8%bd%af\" title=\"[View articles tagged with [Microsoft]]\" target=\"_blank\" >Microsoft<\/a>Official press release, Microsoft today announced a VASA-1 framework for image-generated videos. This AI framework only needs a real-life portrait photo and a personal voice audio clip.<span class=\"accentTextColor\">It can generate accurate and realistic<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%af%b9%e5%8f%a3%e5%9e%8b%e8%a7%86%e9%a2%91\" title=\"[Sees the article with the [Portal Video] label]\" target=\"_blank\" >Lip Sync Videos<\/a>(Generates a scripted video), which is said to be particularly natural in terms of facial expressions and head movements<\/span>.<\/p>\n<p data-vmark=\"58cb\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8657\" title=\"65edef1e-d9cc-4125-9556-c59261375e2d\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/65edef1e-d9cc-4125-9556-c59261375e2d.png\" alt=\"65edef1e-d9cc-4125-9556-c59261375e2d\" width=\"1440\" height=\"765\" \/><\/p>\n<p data-vmark=\"e41e\">Currently, many related research in the industry focuses on lip syncing, while facial dynamic behavior and head movement are usually ignored. As a result, the generated faces will appear stiff, unconvincing and have the uncanny valley phenomenon.<\/p>\n<p data-vmark=\"f0c2\">Microsoft&#039;s VASA-1 framework overcomes the limitations of previous facial generation technology. Researchers used the diffusion Transformer model to train on overall facial dynamics and head movements. The model treats all possible facial dynamics, including lip movements, expressions, eye gaze, and blinking, as a single latent variable (that is, generating an entire highly detailed face at once), and is said to be able to instantly generate 512\u00d7512 resolution 40 FPS videos.<\/p>\n<p data-vmark=\"4403\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-8658\" title=\"abaa5c1f-c14e-4e1b-8c63-a5b1b68ddaa6\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/04\/abaa5c1f-c14e-4e1b-8c63-a5b1b68ddaa6.png\" alt=\"abaa5c1f-c14e-4e1b-8c63-a5b1b68ddaa6\" width=\"1440\" height=\"952\" \/><\/p>\n<p data-vmark=\"a94a\">Microsoft also used 3D technology to assist in marking facial features and designed an additional loss function, claiming that VASA-1 can not only generate high-quality facial videos, but also effectively capture and reproduce facial 3D structure.<\/p>","protected":false},"excerpt":{"rendered":"<p>According to Microsoft's official press release, Microsoft today announced a VASA-1 framework for graph-generated video, an AI framework that uses only a portrait photo of a real person and a piece of audio of a personal voice to generate accurately realistic lip-synced video (video that generates a reading of a script), which is claimed to be particularly natural in terms of expressions and head movements. Much of the research in the industry has focused on lip-syncing, while facial dynamic behavior and head movements are often overlooked, resulting in stiff, unconvincing faces and the Valley of Terror. Microsoft's VASA-1 framework overcomes the limitations of previous facial generation techniques by utilizing the Diffusion Transformer model, which is trained on overall facial dynamics and head movements, and which incorporates all possible facial behaviors and head movements.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[2333,280],"collection":[],"class_list":["post-8656","post","type-post","status-publish","format-standard","hentry","category-news","tag-2333","tag-280"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/8656","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=8656"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/8656\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=8656"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=8656"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=8656"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=8656"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}