{"id":3809,"date":"2024-02-09T12:03:53","date_gmt":"2024-02-09T04:03:53","guid":{"rendered":"https:\/\/www.1ai.net\/?p=3809"},"modified":"2024-02-09T12:03:53","modified_gmt":"2024-02-09T04:03:53","slug":"%e8%8b%b9%e6%9e%9c%e5%b1%95%e7%a4%ba-ai-%e6%96%b0%e6%a8%a1%e5%9e%8b-mgie%ef%bc%8c%e5%8f%af%e4%b8%80%e5%8f%a5%e8%af%9d%e7%b2%be%e4%bf%ae%e5%9b%be%e7%89%87","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/3809.html","title":{"rendered":"Apple shows off new AI model MGIE, which can retouch photos with just one sentence"},"content":{"rendered":"<p data-vmark=\"54c7\">Compared with Microsoft&#039;s booming business,<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%8b%b9%e6%9e%9c%e5%85%ac%e5%8f%b8\" title=\"[Sees articles with labels]\" target=\"_blank\" >Apple<\/a>Apple&#039;s layout in the field of AI seems to be much more low-key, but this does not mean that Apple has no achievements in this field.<strong>Apple recently released a new<a href=\"https:\/\/www.1ai.net\/en\/tag\/mgie\" title=\"[SEE ARTICLE WITH [MGIE] LABEL]\" target=\"_blank\" >MGIE<\/a>\u201d is a new open source AI model that can edit images based on natural language instructions.<\/strong><\/p>\n<p data-vmark=\"a15f\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-3810\" title=\"fc3ac60b-3b50-4723-81e7-6d43bc2df108\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/02\/fc3ac60b-3b50-4723-81e7-6d43bc2df108.jpg\" alt=\"fc3ac60b-3b50-4723-81e7-6d43bc2df108\" width=\"750\" height=\"420\" \/><\/p>\n<p><span class=\"wp-caption-text\">Image source: VentureBeat and Midjourney<\/span><\/p>\n<p data-vmark=\"a186\">The full name of MGIE is <a href=\"https:\/\/www.1ai.net\/en\/tag\/mllm\" title=\"[SEE ARTICLES WITH [MLLM] LABELS]\" target=\"_blank\" >MLLM<\/a>-Guided Image Editing, using<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%9a%e6%a8%a1%e6%80%81%e5%a4%a7%e5%9e%8b%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b\" title=\"[Sees articles with [Multimodal Large Language Model] labels]\" target=\"_blank\" >Multimodal Large Language Models<\/a>(MLLM) interprets user instructions and performs pixel-level operations. MGIE can understand natural language commands issued by users and perform operations such as Photoshop-style modification, global photo optimization, and local editing.<\/p>\n<p data-vmark=\"b688\">Apple and researchers from the University of California, Santa Barbara are collaborating to present MGIE-related research results at the 2024 International Conference on Learning Representations (ICLR), one of the top conferences for artificial intelligence research.<\/p>\n<p data-vmark=\"8e3d\">Before introducing MGIE, let&#039;s first introduce MLLM. MLLM is a powerful AI model that can process text and images simultaneously to enhance instruction-based image editing capabilities. MLLMs have shown excellent capabilities in cross-modal understanding and visual perceptual response generation, but have not yet been widely used in image editing tasks.<\/p>\n<p data-vmark=\"f9dc\">MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive expressive instructions from user inputs. These instructions are concise and provide clear guidance for the editing process.<\/p>\n<p data-vmark=\"774d\">For example, when you enter &quot;<strong>Make the sky bluer<\/strong>\u201d, MGIE can generate \u201c<strong>Increased the saturation of the sky area by 20%<\/strong>\u201d instruction.<\/p>\n<p data-vmark=\"2ce1\">Second, it uses MLLM to generate visual imagination, a latent representation of the desired edits. This representation captures the essence of the edits and can be used to guide pixel-level operations. MGIE adopts a novel end-to-end training scheme that jointly optimizes instruction derivation, visual imagination, and image editing modules.<\/p>\n<p data-vmark=\"d514\">MGIE can handle a wide range of editing situations, from simple color adjustments to complex object manipulations. The model can also perform global and local editing based on the user&#039;s preferences. Some of the features and capabilities of MGIE include:<\/p>\n<ul class=\"list-paddingleft-2\">\n<li>\n<p data-vmark=\"bad7\"><strong>Directive-based expression editing:<\/strong>MGIE can generate clear and concise instructions to effectively guide the editing process, which not only improves the quality of editing but also enhances the overall user experience.<\/p>\n<\/li>\n<li>\n<p data-vmark=\"637c\"><strong>Photoshop style modification:<\/strong>MGIE can perform common Photoshop-style edits, such as cropping, resizing, rotating, flipping, and adding filters. The model can also apply more advanced edits, such as changing backgrounds, adding or removing objects, and blending images.<\/p>\n<\/li>\n<li>\n<p data-vmark=\"8b2d\"><strong>Global photo optimization<\/strong>MGIE can optimize the overall quality of a photo, such as brightness, contrast, sharpness, and color balance. The model can also apply artistic effects such as sketching, painting, and comics.<\/p>\n<\/li>\n<li>\n<p data-vmark=\"ee23\"><strong>Local Edit:<\/strong>MGIE can edit specific regions or objects in an image, such as faces, eyes, hair, clothing, and accessories. The model can also modify the properties of these regions or objects, such as shape, size, color, texture, and style.<\/p>\n<\/li>\n<\/ul>\n<p data-vmark=\"96e6\">MGIE is an open source project on GitHub where users can find code, data, and pre-trained models. The project also provides a demo notebook showing how to use MGIE to complete various editing tasks.<\/p>","protected":false},"excerpt":{"rendered":"<p>Compared to Microsoft, Apple's layout in the field of AI seems to be a lot more low-key, but that doesn't mean that Apple hasn't made the slightest progress in the field. Apple recently unveiled a new open-source AI model called MGIE, which can edit images based on natural language commands. Credit: VentureBeat in collaboration with Midjourney MGIE, or MGIE in full, is MLLM-Guided Image Editing, which utilizes a multimodal large-scale language model (MLLM) to interpret user commands and perform pixel-level operations.MGIE understands natural language commands given by the user and performs Photoshop-style modifications, operations such as global photo optimization and local editing. Apple Inc.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[1212,1213,1214,1211],"collection":[],"class_list":["post-3809","post","type-post","status-publish","format-standard","hentry","category-news","tag-mgie","tag-mllm","tag-1214","tag-1211"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3809","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=3809"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/3809\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=3809"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=3809"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=3809"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=3809"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}