A couple days ago, a user sent me a youtube channel and asked me if AI made the video.
When I look at it, the content of this number is very simple, it's just a pretty lady dressed in a cool outfit, playing jazz music at the mixing desk

Started 3 months ago, 26 videos posted, 12.3K followers already!
These are all in the picture.AI did it. DJ(polite) my daughterEach video is about an hour long.
The point is, the traffic is good, with 10,000 to 100,000 views per video

Netizen also asked for advice on exactly how to do
Of course I want to fulfill the needs of my fans! Here's my replica of the Netflix DJ

The original account is obviously a beauty alchemized with native Stable Diffusion, and the graphics card costs at least several thousand dollars, requiring not only a high computer hardware configuration, but also a local ride model
It's just so not newbie friendly!
So, in this installment of the tutorial, I'm going to give the full process directly from creating the static image to the beautiful woman swinging up to the music generation
No skills required, all free software used!
Video with AI is categorized into text-born video and graph-born video.
Chart-born video is now a more mature way, so to make a video you must first have a chart, and to make a chart you must have a drawing cue word
First generate the mapping cue
We found benchmarked accounts and had AI analyze one of the account's most trafficked images to generate a mapping cue word
When it comes to image analysis, the world's most powerful is Google's gemini pro
(For those who are not comfortable with gemini, you can use kimi instead, which has recently been upgraded to analyze images)
Open aistudio.google.com and find gemini 1.5 flash
You can now also use the 2.0 flash that was just released.
Make a request:"Analyze this photo."

gemini is analyzed in Chinese as follows:

Because we need to use foreign AI drawing software, Chinese support is not too good, I asked it to translate to English, and got the following prompts
"The image shows a Young Asian woman , lost in the rhythm, her headphones obscuring her ears as she works the DJ decks. Clad in a dazzling rainbow low V dress,low V , her concentration is palpable. Clad in a dazzling rainbow low V dress,low V , her concentration is palpable. A stylish spiral staircase and strategically placed speakers complete the scene. "
DIRECT TRANSLATION: This image shows a young Asian woman, immersed in the beat, her headphones covering her ears as she operates the DJ booth. Her concentration is evident as she wears a dazzling rainbow low V-neck dress with a low V-neckline. A sleek spiral staircase and strategically placed speakers complete the scene.
Let's go to the drawing board.
I'm using Google's latest mapping AI imagen 3
The cue word remains the same, you can use flux (silicon-based), i.e., dream, or korin to make a graph all work
The actual results are as follows, Flux I think is overproposed


Imagen 3 was just released at labs.google/fx/tools/image-fx
Do the video again.
Here we use the latest Kerin 2.0, which is the strongest graphical video AI available.
Previously, Conch AI was used more often, but after Spirit 2.0 came out, Conch suddenly didn't smell good anymore.
1.6 model 5 seconds 30 points. Quality upgraded, price 2.0 model is higher, 5 sec HD 100 pts, 10 sec 200 pts

Plus sound effects:

Since the DJ's movements are small, we'll just use 10 seconds and repeat.
Make music with AI again.
Because it's a DJ, there must be music
We're logged in suno
suno upgraded to V4
Select V4, select Pure Music, and enter the cue word "JAZZ."

Of course, you can also use Instant Dream's music generation
Finally, synthesize it with editing software such as Cutscreen and Tencent Zhiying (zenvideo.qq.com). Done!
Look at the finished result
How do you dress a DJ beauty differently, you may ask?
Simple, AI-enabled fitting

Ten seconds. Clothes change.

You can also modify the clothes by adjusting the cue words in the first step, the poses
Aside from the beautiful DJs, we could try something else, like taking it a step further, like lip-synching.
Found it?
A lot of short videos you've seen exploding play, the traffic code is this one
First, the needs. Beauty, Education, Health, Nationalism, Reading
Second, technology. Technology iterates, new features come out that work better and bring about scene changes. As I said in "High Speed Rail Mask Beauty", people aresexy girlBut the clothes are sexy, but the style is simple. But you can change clothes with AI, global fashion, and you can make a masked beauty wear whatever you want. That's how you differentiate yourself.
Third, the scene. Just look at the beauty can not, but also have a scene, such as today's DJ, the Internet popularity of the Buddha Yuan, kitchen Yuan, what is the demand for the implementation of the scene.
Beautiful DJ, you can also do AI beautiful car model series, you look at Imagen 3 do car model effect

New media belongs to the fast-paced field, only constant innovation can cope with the ever-changing market.