{"id":24743,"date":"2024-12-09T10:07:48","date_gmt":"2024-12-09T02:07:48","guid":{"rendered":"https:\/\/www.1ai.net\/?p=24743"},"modified":"2024-12-09T10:07:48","modified_gmt":"2024-12-09T02:07:48","slug":"%e5%85%a8%e6%96%b0%e5%9b%be%e5%83%8f%e4%b8%80%e8%87%b4%e6%80%a7%e7%94%9f%e6%88%90%e6%a8%a1%e5%9e%8bomnigen%e6%b5%8b%e8%af%95%e5%8f%8a%e9%83%a8%e7%bd%b2%ef%bc%8c%e4%bf%9d%e6%8c%81%e4%ba%ba%e7%89%a9","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/24743.html","title":{"rendered":"New image consistency generation model OmniGen tested and deployed to maintain consistent character or object manipulation"},"content":{"rendered":"<p>Today's share is a very interesting AI tool:<a href=\"https:\/\/www.1ai.net\/en\/tag\/omnigen\" title=\"_Other Organiser\" target=\"_blank\" >OmniGen<\/a><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24744\" title=\"ca82819ej00so7enu002td000u000bfm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/ca82819ej00so7enu002td000u000bfm.jpg\" alt=\"ca82819ej00so7enu002td000u000bfm\" width=\"1080\" height=\"411\" \/><\/p>\n<p><strong>I. What is OmniGen<\/strong><\/p>\n<p>OmniGen is a \"Unified<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%9b%be%e5%83%8f%e7%94%9f%e6%88%90%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with tags]\" target=\"_blank\" >Image Generation Model<\/a>\", without the need to install plug-ins such as ControlNet, IP-Adapter, Reference-Net, etc., can automatically recognize the features (e.g., a certain object, pose, mapping) in the input image based on textual prompts.<\/p>\n<p>What's the point?<\/p>\n<p>For example, if you want to put the characters in the two pictures below in the same background, which used to be more cumbersome, you can now do it with one line of instruction.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24745\" title=\"38bb939dj00so7enu002md000u000jwm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/38bb939dj00so7enu002md000u000jwm.jpg\" alt=\"38bb939dj00so7enu002md000u000jwm\" width=\"1080\" height=\"716\" \/><\/p>\n<p>Ta-da!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24746\" title=\"353914ebj00so7ent0026d000u000gcm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/353914ebj00so7ent0026d000u000gcm.jpg\" alt=\"353914ebj00so7ent0026d000u000gcm\" width=\"1080\" height=\"588\" \/><\/p>\n<p>The cue word used above:<br \/>\n<em>The little girl and the man were standing in the street. the girl is left in<\/em>&lt;img&gt;&lt;|image_1|&gt;&lt;\/img&gt;The man is middle in &lt;img&gt;&lt;|image_2|&gt;&lt;\/img&gt;.<\/p>\n<p>&nbsp;<\/p>\n<p>There are other uses, such as trying to get the girl in red, pictured below left, to wear the white dress pictured below right:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24748\" title=\"caee82ccj00so7enu0035d000u000jwm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/caee82ccj00so7enu0035d000u000jwm.jpg\" alt=\"caee82ccj00so7enu0035d000u000jwm\" width=\"1080\" height=\"716\" \/><\/p>\n<p>Tips:<br \/>\n<em>a girl wear a white dress. the girl is left in<\/em>&lt;img&gt;&lt;|image_1|&gt;&lt;\/img&gt;The white dress is in&lt;img&gt;&lt;|image_1|&gt;&lt;\/img&gt;.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24747\" title=\"9d29bd65j00so7enu002cd000u000fkm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/9d29bd65j00so7enu002cd000u000fkm.jpg\" alt=\"9d29bd65j00so7enu002cd000u000fkm\" width=\"1080\" height=\"560\" \/><\/p>\n<p><strong><br \/>\nII. Introduction to OmniGen application scenarios<\/strong><\/p>\n<p>There are a number of official publicized scenes, which is an image of a girl, with text prompts to change her pose (beaming):<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24749\" title=\"5b970237j00so7ent001rd000u000h3m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/5b970237j00so7ent001rd000u000h3m.jpg\" alt=\"5b970237j00so7ent001rd000u000h3m\" width=\"1080\" height=\"615\" \/><\/p>\n<p>There are two people in the picture, select the person on the right and change his clothes, actions, and scene:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24750\" title=\"ac5948a8j00so7ent001od000u000fim\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/ac5948a8j00so7ent001od000u000fim.jpg\" alt=\"ac5948a8j00so7ent001od000u000fim\" width=\"1080\" height=\"558\" \/><\/p>\n<p>Take two people from each of the two charts and have them count the money in the room:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24751\" title=\"e8e0d8d2j00so7enu0023d000u000fem\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/e8e0d8d2j00so7enu0023d000u000fem.jpg\" alt=\"e8e0d8d2j00so7enu0023d000u000fem\" width=\"1080\" height=\"554\" \/><\/p>\n<p>Even if there are more than two people in the picture, the AI can recognize them by suggesting words, such as \"the man in the middle\" on the left and \"the oldest woman\" on the right, who are chatting on the road:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24752\" title=\"c88ec0caj00so7ent0025d000u000f8m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/c88ec0caj00so7ent0025d000u000f8m.jpg\" alt=\"c88ec0caj00so7ent0025d000u000f8m\" width=\"1080\" height=\"548\" \/><\/p>\n<p>The new image generated will retain the basic recognizable features of the person:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24753\" title=\"038c0729j00so7ent0023d000u000fcm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/038c0729j00so7ent0023d000u000fcm.jpg\" alt=\"038c0729j00so7ent0023d000u000fcm\" width=\"1080\" height=\"552\" \/><\/p>\n<p>Place a bouquet of flowers in a vase of the indicated color and arrange on a glass tabletop:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24754\" title=\"7e52099dj00so7ent001rd000u000gpm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/7e52099dj00so7ent001rd000u000gpm.jpg\" alt=\"7e52099dj00so7ent001rd000u000gpm\" width=\"1080\" height=\"601\" \/><\/p>\n<p>Remove the girl's earrings while replacing the cup in the background with a Coke:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24755\" title=\"4039a1dbj00so7enu001vd000u000fgm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/4039a1dbj00so7enu001vd000u000fgm.jpg\" alt=\"4039a1dbj00so7enu001vd000u000fgm\" width=\"1080\" height=\"556\" \/><\/p>\n<p>Extracts the motion frames of the characters in the image (usually requires the ControlNet plugin to do this):<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24756\" title=\"ed737ad6j00so7ent001fd000u000f7m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/ed737ad6j00so7ent001fd000u000f7m.jpg\" alt=\"ed737ad6j00so7ent001fd000u000f7m\" width=\"1080\" height=\"547\" \/><\/p>\n<p>It is also possible to generate new images directly from the action frames:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24757\" title=\"fa6a16d4j00so7enu001zd000u000fym\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/fa6a16d4j00so7enu001zd000u000fym.jpg\" alt=\"fa6a16d4j00so7enu001zd000u000fym\" width=\"1080\" height=\"574\" \/><\/p>\n<p><strong>III. OmniGen Local Deployment<\/strong><\/p>\n<p>The method is not complicated. First<strong>Ensure that your network is \"free\" and that you have basic tools such as Python and Git installed.<\/strong>.<\/p>\n<p>Enter the command window and execute the following commands in order (using the N card as an example):<\/p>\n<p>conda create -n omnigen python=3.10<\/p>\n<p>conda activate omnigen<\/p>\n<p>conda install pytorch=2.3.1 torchvision=0.18.1 torchaudio=2.3.1 pytorch-cuda=11.8 -c pytorch -c nvidia<\/p>\n<p>git clone https:\/\/github.com\/staoxiao\/OmniGen.git<\/p>\n<p>cd OmniGen<\/p>\n<p>pip install -e .<\/p>\n<p>pip install gradio spaces<\/p>\n<p>python app.py<\/p>\n<p>To avoid having to activate the environment for each use<strong>A batch file can be created<\/strong>The content is as follows:<\/p>\n<p>@echo off<br \/>\ncall conda activate omnigen<br \/>\npython app.py<br \/>\npause<\/p>\n<p>The first time you run it, it automatically downloads the required models and requires more than 15GB of hard disk space:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24758\" title=\"ad9917ffj00so7ent003qd000u000gim\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/ad9917ffj00so7ent003qd000u000gim.jpg\" alt=\"ad9917ffj00so7ent003qd000u000gim\" width=\"1080\" height=\"594\" \/><\/p>\n<p><strong>IV. Methods of use<\/strong><\/p>\n<p>The prompt word is basically in accordance with everyday syntax, the only thing to note is the specified image, which needs to follow this format: , where \"i\" is a number from 1 to 3.<\/p>\n<p>&nbsp;<\/p>\n<p>For example, if you upload three images, Figure 1 is male, Figure 2 is female, and Figure 3 is a street, and you want to generate male + female + background, the prompt word will be:<br \/>\n<em>A man in middle in\u00a0<\/em>&lt;img&gt;&lt;|image_1|&gt;&lt;\/img&gt;and a woman in middle in&lt;img&gt;&lt;|image_2|&gt;&lt;\/img&gt; Holding hands in the street like&lt;img&gt;&lt;|image_3|&gt;&lt;\/img&gt;.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24759\" title=\"cb4c1277j00so7enu0028d000u000g0m\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/cb4c1277j00so7enu0028d000u000g0m.jpg\" alt=\"cb4c1277j00so7enu0028d000u000g0m\" width=\"1080\" height=\"576\" \/><\/p>\n<p>Finally, we'll test the celebrity image combination by having Black Widow and Master Ma pose for a picture:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-24760\" title=\"e7044d4bj00so7ent001pd000u000glm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/12\/e7044d4bj00so7ent001pd000u000glm.jpg\" alt=\"e7044d4bj00so7ent001pd000u000glm\" width=\"1080\" height=\"597\" \/><\/p>\n<p><strong>V. Conclusion<\/strong><\/p>\n<p>1, OmniGen can recognize the gender, age, location, clothing (color), etc. of the people in the image, so that the prompt words can be closer to everyday language.<\/p>\n<p>2. For application scenarios that require two specific characters to appear in the same image, OmniGen can come in handy.<\/p>\n<p>3, OmniGen currently generates the effect is still not perfect, but no additional plug-in all-in-one processing, in line with the future development trend of AIGC.<\/p>\n<p>4, OmniGen generates a map takes a long time (4090 about 1 minute and a half, 4060 needs 4 to 5 minutes), the efficiency needs to be optimized.<\/p>\n<p>The article covers the URL:<\/p>\n<p>OmniGen's code page:<br \/>\nhttps:\/\/github.com\/VectorSpaceLab\/OmniGen<\/p>","protected":false},"excerpt":{"rendered":"<p>A very interesting AI tool was shared today: OmniGen I and OmniGen is a \"Uniform Image Generation Model\" that automatically identifies the characteristics (e.g., an object, position, map) of the input images, based on text tips, without the need to install plugins such as ControlNet, IP-Adapter, Reference-Net. What's the use? For example, the following two diagrams, if you want to put the characters in the same context, have been more troublesome in the past, and now only one line of instructions. The little girl and the man were standing in the street<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[144],"tags":[4880,4881],"collection":[],"class_list":["post-24743","post","type-post","status-publish","format-standard","hentry","category-baike","tag-omnigen","tag-4881"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/24743","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=24743"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/24743\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=24743"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=24743"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=24743"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=24743"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}