New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Today's share is a very interesting AI tool:OmniGen

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

I. What is OmniGen

OmniGen is a "UnifiedImage Generation Model", without the need to install plug-ins such as ControlNet, IP-Adapter, Reference-Net, etc., can automatically recognize the features (e.g., a certain object, pose, mapping) in the input image based on textual prompts.

What's the point?

For example, if you want to put the characters in the two pictures below in the same background, which used to be more cumbersome, you can now do it with one line of instruction.

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Ta-da!

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

The cue word used above:
The little girl and the man were standing in the street. the girl is left in<img><|image_1|></img>The man is middle in <img><|image_2|></img>.

 

There are other uses, such as trying to get the girl in red, pictured below left, to wear the white dress pictured below right:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Tips:
a girl wear a white dress. the girl is left in<img><|image_1|></img>The white dress is in<img><|image_1|></img>.

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation


II. Introduction to OmniGen application scenarios

There are a number of official publicized scenes, which is an image of a girl, with text prompts to change her pose (beaming):

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

There are two people in the picture, select the person on the right and change his clothes, actions, and scene:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Take two people from each of the two charts and have them count the money in the room:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Even if there are more than two people in the picture, the AI can recognize them by suggesting words, such as "the man in the middle" on the left and "the oldest woman" on the right, who are chatting on the road:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

The new image generated will retain the basic recognizable features of the person:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Place a bouquet of flowers in a vase of the indicated color and arrange on a glass tabletop:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Remove the girl's earrings while replacing the cup in the background with a Coke:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Extracts the motion frames of the characters in the image (usually requires the ControlNet plugin to do this):

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

It is also possible to generate new images directly from the action frames:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

III. OmniGen Local Deployment

The method is not complicated. FirstEnsure that your network is "free" and that you have basic tools such as Python and Git installed..

Enter the command window and execute the following commands in order (using the N card as an example):

conda create -n omnigen python=3.10

conda activate omnigen

conda install pytorch=2.3.1 torchvision=0.18.1 torchaudio=2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia

git clone https://github.com/staoxiao/OmniGen.git

cd OmniGen

pip install -e .

pip install gradio spaces

python app.py

To avoid having to activate the environment for each useA batch file can be createdThe content is as follows:

@echo off
call conda activate omnigen
python app.py
pause

The first time you run it, it automatically downloads the required models and requires more than 15GB of hard disk space:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

IV. Methods of use

The prompt word is basically in accordance with everyday syntax, the only thing to note is the specified image, which needs to follow this format: , where "i" is a number from 1 to 3.

 

For example, if you upload three images, Figure 1 is male, Figure 2 is female, and Figure 3 is a street, and you want to generate male + female + background, the prompt word will be:
A man in middle in <img><|image_1|></img>and a woman in middle in<img><|image_2|></img> Holding hands in the street like<img><|image_3|></img>.

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

Finally, we'll test the celebrity image combination by having Black Widow and Master Ma pose for a picture:

New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation

V. Conclusion

1, OmniGen can recognize the gender, age, location, clothing (color), etc. of the people in the image, so that the prompt words can be closer to everyday language.

2. For application scenarios that require two specific characters to appear in the same image, OmniGen can come in handy.

3, OmniGen currently generates the effect is still not perfect, but no additional plug-in all-in-one processing, in line with the future development trend of AIGC.

4, OmniGen generates a map takes a long time (4090 about 1 minute and a half, 4060 needs 4 to 5 minutes), the efficiency needs to be optimized.

The article covers the URL:

OmniGen's code page:
https://github.com/VectorSpaceLab/OmniGen

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
TutorialEncyclopedia

A sentence quickly generate a poster image with Chinese, with that dream AI teach you to do AI Chinese poster

2024-12-8 10:00:33

Encyclopedia

One click to create "AI Digital Podcast Video", quickly create your free customized digital human doppelganger!

2024-12-9 10:24:59

Search