{"id":16036,"date":"2024-07-21T09:15:37","date_gmt":"2024-07-21T01:15:37","guid":{"rendered":"https:\/\/www.1ai.net\/?p=16036"},"modified":"2024-07-21T09:15:50","modified_gmt":"2024-07-21T01:15:50","slug":"%e9%9b%b6%e5%9f%ba%e7%a1%80%e5%b0%8f%e7%99%bd%e5%bf%ab%e9%80%9f%e5%85%8d%e8%b4%b9%e9%83%a8%e7%bd%b2ai%e6%95%b0%e5%ad%97%e4%ba%ba%ef%bc%8cmusetalk%e7%a6%bb%e7%ba%bf%e5%8f%a3%e5%9e%8b%e5%90%8c%e6%ad%a5","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/16036.html","title":{"rendered":"MuseTalk offline lip-sync digital human tool allows beginners to quickly and for free deploy AI digital humans"},"content":{"rendered":"<p data-pm-slice=\"0 0 []\"><a href=\"https:\/\/www.1ai.net\/en\/tag\/musetalk\" title=\"[See articles with [MuseTalk] label]\" target=\"_blank\" >MuseTalk<\/a> It is a real-time high-quality audio-driven lip-sync model developed by Tencent Music Tianqin Lab, which is specifically used for virtual mouth shape generation. The model can automatically adjust the facial image of the digital character according to the input audio signal, so that its lip shape is highly synchronized with the audio content, thereby achieving the effect of matching the lip shape with the sound. MuseTalk performs well in lip shape generation, and can generate accurate lip shapes with good picture consistency, especially for real-person video generation.<\/p>\n<p data-track=\"61\">The main features of MuseTalk include:<\/p>\n<ol>\n<li data-track=\"62\">Real-time performance: Real-time inference at more than 30 frames per second can be achieved on NVIDIA Tesla V100.<\/li>\n<li data-track=\"63\">Multi-language support: supports audio input in multiple languages such as Chinese, English and Japanese, which enables it to provide services to users in different countries and regions.<\/li>\n<li data-track=\"64\">High-precision lip sync: Through Latent Space Inpainting technology, high-precision lip modification can be performed on a 256 x 256 pixel facial area.<\/li>\n<li data-track=\"65\">High picture consistency: The generated lip shape matches the sound accurately and the picture consistency is good.<\/li>\n<li data-track=\"66\">Wide range of application scenarios: Suitable for a variety of video content processing needs, such as self-media production, virtual anchors, etc.<\/li>\n<\/ol>\n<p data-track=\"67\">However, the deployment process of MuseTalk is rather cumbersome and difficult for novice users, and it has high requirements for computer graphics cards and memory. Fortunately, Google launched Google Colab, with which we can quickly, free and easily deploy MuseTalk. Google Colab (also known as Colaboratory) is a free cloud development environment provided by Google, mainly used for tasks such as data analysis, machine learning and deep learning. It is based on Jupyter Notebook, and users can directly write and execute Python code through the browser, and can share and collaborate on editing code with others.<\/p>\n<p data-track=\"68\">First, open this address:<\/p>\n<p data-track=\"69\">https:\/\/colab.research.google.com\/github\/camenduru\/MuseTalk-jupyter\/blob\/main\/MuseTalk_jupyter.ipynb<\/p>\n<p data-track=\"70\">Click the upper right corner, change the runtime type, and select T4GPU<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16037\" title=\"get-598\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-598.jpg\" alt=\"get-598\" width=\"946\" height=\"572\" \/><\/div>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16038\" title=\"get-599\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-599.jpg\" alt=\"get-599\" width=\"705\" height=\"239\" \/><\/div>\n<p data-track=\"71\">You can see that Google Colab has allocated us free 12G memory, 78G hard disk, and GPU computing resources;<\/p>\n<p data-track=\"72\">Click the small triangle to run the code:<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16040\" title=\"get-601\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-601.jpg\" alt=\"get-601\" width=\"720\" height=\"317\" \/><\/div>\n<p data-track=\"73\">After about 3 minutes, the operation is successful.<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16039\" title=\"get-600\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-600.jpg\" alt=\"get-600\" width=\"721\" height=\"321\" \/><\/div>\n<p data-track=\"74\">When you see the line Running on public URL, it means that MuseTalk has been successfully deployed, then click this URL:<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16041\" title=\"get-602\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-602.jpg\" alt=\"get-602\" width=\"721\" height=\"340\" \/><\/div>\n<p data-track=\"75\">Upload an audio and a reference video:<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16042\" title=\"get-603\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-603.jpg\" alt=\"get-603\" width=\"719\" height=\"407\" \/><\/div>\n<p data-track=\"76\">It takes more than 10 seconds to process the video after it is uploaded<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16043\" title=\"get-604\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-604.jpg\" alt=\"get-604\" width=\"719\" height=\"476\" \/><\/div>\n<p data-track=\"77\">Then click: Generate<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16044\" title=\"get-605\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-605.jpg\" alt=\"get-605\" width=\"720\" height=\"334\" \/><\/div>\n<p data-track=\"78\">If: Error appears, Connection errored out.<\/p>\n<p data-track=\"79\">You can shorten the video and audio duration to about 20 seconds, and then run it again;<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16045\" title=\"get-606\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-606.jpg\" alt=\"get-606\" width=\"720\" height=\"408\" \/><\/div>\n<p data-track=\"80\">The last step takes more time, usually more than 20 minutes;<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16046\" title=\"get-607\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-607.jpg\" alt=\"get-607\" width=\"1057\" height=\"628\" \/><\/div>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16047\" title=\"get-608\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-608.jpg\" alt=\"get-608\" width=\"720\" height=\"376\" \/><\/div>\n<p data-track=\"81\">When the video appears on the right, the processing is complete:<\/p>\n<div class=\"pgc-img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-16048\" title=\"get-609\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/07\/get-609.jpg\" alt=\"get-609\" width=\"720\" height=\"368\" \/><\/div>\n<p data-track=\"82\">Then click download in the upper right corner to download the processed video.<\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>MuseTalk is a real-time, high-quality audio-driven mouth synchronization model developed by Tencent Music's Tianqin Lab, specifically for virtual mouth shape generation. The model can automatically adjust the facial image of a digital character according to the input audio signal, so that its lip shape is highly synchronized with the audio content, thus achieving the effect of matching the mouth shape with the voice.MuseTalk has excellent performance in mouth shape generation, and is able to generate accurate and picture-consistent mouth shapes, and is especially good at live-action video generation. MuseTalk's key features include: Real-time performance: real-time reasoning at over 30 frames per second on the NVIDIA Tesla V100. Multi-language support: Support for audio input in Chinese, English, and Japanese makes it possible to provide a wide range of audio inputs for the<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[149,144],"tags":[165,2480,1896],"collection":[],"class_list":["post-16036","post","type-post","status-publish","format-standard","hentry","category-jiaocheng","category-baike","tag-ai","tag-musetalk","tag-1896"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/16036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=16036"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/16036\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=16036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=16036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=16036"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=16036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}