Recently a particularly interesting stream of work has been painted:GPT TO GENERATE ACTION LABELS FIRSTMirrorAnd feed Feedance 2.0 directly to a consistent short film。
I've been testing one of them latelyAI VideoWorkstream。
Used toAI short filmWhat's the worst headache
It's not pretty enough. It's: moving around. The shot's out of control. The character is like this for a second. The next second is like an actor. It wasn't easy to produce a video, and the result wasn't exactly what it wanted. But recently I found a very useful combination:
GPT + GPT Image + Seedance 2.0 first let GPT design the action lens。
Let GPT draw the specs with arrows. It's the last time you throw it to Seedance to generate a video。
The key is the direction of the action in the spectroscope, the trajectories of the lens, the movement of people, and the ability of Seedance to understand。
The resulting video was executed almost according to the lens 1:1。
THE VIDEO WAS FINALLY TURNED FROM A "TICK CARD" TO A "DIRECTOR"。
We'll tear the whole thing down to you today. It'll work。
STEP 1: GET GPT TO WRITE A SCRIPT OF THE TEXT
A lot of people went straight to the spectroscopy, and half of them found that the action logic was not right and the lens was broken。
The right thing to do is..Write a text version of the schedule firstAnd make it visual。
How do you write a hint
YOU NEED TO GET GPT INTO THE ROLE OF A MOVIE ACTION DESIGNER:
Tips:
You're a film-class martial arts designer, and you're good at designing coherent, robust action lenses。
Please generate an action spectroscopy script of about 15 seconds, including:
-
Number and duration of each shot -
Speculation of the lens (character/median view/vision/topography, etc.) -
Role position and track in the picture -
Camera motion (push/push/smash/smash) -
Key action rhythm description
Subject:
Role: [insert role description]
Style: [insert style keyword]
Give me a chestnut
FOR EXAMPLE, I WANT A 15-SECOND SHORT FILM OF THE OLD WIND'S GATE, AND GPT RETURNS TO A TEXT LENS LIKE THIS:
- camera 1 (2s): Vision/set, Mountain Gate Panorama, morning fog, role above step
- camera 2 (2s) : Midview/Stalking, role step up, dress up
- camera 3 (1.5s): close up, the role pupils are shrunk, the dark fog is rising far away
- camera 4 (2s): Vision/push lens, black fog swallows the gates, oppression is full
- camera 5 (1.5s): Midview/opposite, the part pulls the sword around, the sword slashs through the dark
- camera 6 (2s) : close-up/slow motion, blades of black fog, fragments scattered
- camera 7 (2s) : Vision/Link, part of the debris of independence, all around
- camera 8 (2s) : close-up, character eyeballs display red light, picture set
With this text, there is evidence for the second step。

Step 2: Generate a spectroscopy with labels using GPT Image 2
Pure spectroscopy you can read, but Seedance can't。
So let GPT Image 2 put the word "draw" out..The point is to take the track.
Cue word templates
Tips:
Please generate a 3x4 film spectroscopy grid map (out of 12), each of which contains:
-
Image content (fireman/simplistic drawings, focus on action attitude and location) -
Red arrow indicates role direction -
Blue arrow points the camera track -
Number and length of lens per cell
Require:
-
All visual characters look the same (physical, clothing, identity) -
The movement direction must be clear and the arrows must not overlap -
The whole picture is clean, the lines are simple and powerful -
style: film pre-visible spectroscope style
Text Script:
[Pasting text spectroscopy generated by the first step]
Why the matchmaker/simple
becauseTHE CORE OF THE SPECS IS NOT TO MAKE IT LOOK GOOD.
Instead, a fine-drawing wind interferes with Seedance's identification of action -- it focuses on the details of the picture, not on the track. The matchesman plus arrow points out that Seedance understands better。
Actual effects
GPT Image 2 produces a spectrograph about this:

- 12 grids, one shot each
- The role is presented in simple matches
- Red arrows indicate the movement and movement of people
- Blue Arrow Marks the camera and pulls
- Number of hours per grid lower right corner
This is the input certificate for step three。
Step three: Feed Feedance 2.0, mirror directly to video
This is the best step。
Throws the spectroscopy generated by the previous step together with the role-setting diagram to Seedance 2.0, with a precise hint, to generate consistent video following the arrow direction and action logic in the spectroscopy。
Cue word templates
Tips:
This is a role-setting diagram {Portrait}. Please follow the lens-by-scope action in this spectrograph。
Here's the text of the action script script:
[Playing text lens for first step]
Require:
-
Follow strictly the direction and trajectory of the lens as indicated in the spectrograph -
The role looks and sets are consistent -
It's natural to connect, not jump frame or mutation -
The image style corresponds to the spectroscopy
Key Tips
- The character set must be given:Seedance needs to know what the role looks like, or change the face of each frame
- The text specs are to be posted together: Pure spectroscopy may throw out the details. Text supplements lock the action logic
- Run the short clip test first: Don't run for 15 seconds at a time. Three to five seconds to try


Full Workstream Overview
Three steps, a simple version:
|
|
|
|
|
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
It's common for rookies to roll over
- Skip the first step straight: The logic of the movement is confused and the lens cannot be connected
- It's too delicate:Seedance's attention was distracted by the details of the scene, but the movement was lost
- No character chartingIt's not the same for each role. It's like changing an actor
- It's too short: Light Write "Generate Video by this lens" is not enough, locks the action logic again with text
Final Thoughts
The most exciting thing about this work stream is..THE SCHEMATIC IS NO LONGER JUST A REFERENCE FOR PEOPLE, BUT A “CONSTRUCTION DRAWING” THAT CAN BE READ DIRECTLY BY THE AI VIDEO TOOL。
The arrows and trajectories drawn by GPT Image 2, which Seedance can really identify and execute, are a key step from the "random card" to "precision control"。
The threshold of the tool is low and the process is not complex, but the effect is visible。
There are no particularly high thresholds for the entire process。
But the improvement in the quality of video is clear。
IF YOU'RE DOING THE AI SHORTS, THE AI COMICS, THE AI ADS OR THE AI MOVIE PREVIEWS。
This workstream is worth a try。