{"id":4968,"date":"2024-03-06T09:49:03","date_gmt":"2024-03-06T01:49:03","guid":{"rendered":"https:\/\/www.1ai.net\/?p=4968"},"modified":"2024-03-06T09:49:03","modified_gmt":"2024-03-06T01:49:03","slug":"stability-ai%e5%8f%91%e5%b8%83sd3%e6%8a%80%e6%9c%af%e6%8a%a5%e5%91%8a-%e6%8a%ab%e9%9c%b2sd3%e6%9b%b4%e5%a4%9a%e7%bb%86%e8%8a%82","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/4968.html","title":{"rendered":"Stability AI releases SD3 technical report to reveal more details of SD3"},"content":{"rendered":"<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/stability\" title=\"[Sees articles with [Stability] labels]\" target=\"_blank\" >Stability<\/a> AI recently released their<span class=\"spamTxt\">Strongest<\/span>The image generation model of the <a href=\"https:\/\/www.1ai.net\/en\/tag\/stable-diffusion3\" title=\"_Other Organiser\" target=\"_blank\" >Stable Diffusion3<\/a>(SD3) technical report, revealing more details about SD3. According to Stability AI, SD3 outperforms all current open source and commercial models in terms of typographic quality, aesthetic quality, and cue word understanding, and is the current<span class=\"spamTxt\">Strongest<\/span>The image generation model of the<\/p>\n<p class=\"article-content__img\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4969\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/03\/6384524806887817758522152.jpg\" alt=\"\" width=\"534\" height=\"430\" \/><\/p>\n<p>Highlights of the technical report are as follows.<\/p>\n<p>Based on human preference assessments, SD3 outperforms current<span class=\"spamTxt\">First<\/span>Advanced text generation imaging systems such as DALL-E3, Midjourney v6 and Ideogram v1.<\/p>\n<p>The report presents a new Multimodal Diffusion Transformer (MMDiT) architecture that uses separate sets of weights for image and language. This architecture improves the system's text comprehension and spelling capabilities compared to previous versions of SD3.<\/p>\n<p>The SD38B size model can be run on GTX409024G memory. In addition, SD3 will be releasing several models of varying parameter sizes to run on consumer hardware, ranging from 800M to 8B.<\/p>\n<p>The SD3 structure is based on Diffusion Transformer (DiT), see Peebles &amp; Xie, 2023). In view of the large conceptual differences between text and image embedding, they use a separate weight set for the two models. In this way, the information flows between the image Token and the text Token, thus improving the overall understanding and layout quality of model production results\u3002<\/p>\n<p>SD3 employs the Rectified Flow (RF) formulation, where data and noise are connected on a linear trajectory during training. This results in a straighter inference path, which can be sampled using fewer steps.<\/p>\n<p>They also conducted research on extending the corrective flow Transformer model, using a reweighted RF formulation and the MMDiT backbone network to train a series of models ranging in size from 15 Transformer blocks (450 million parameters) to 38 blocks (8 billion parameters).<\/p>\n<p>SD3 also introduces flexible text encoders, and by removing the memory-intensive T5 text encoders (with up to 4.7 billion references) in the inference phase, SD3's memory footprint can be significantly reduced with little performance loss.<\/p>\n<p>Overall, this technical report from Stability AI reveals the power and details of SD3, showing its leadership in image generation.<\/p>\n<p>Details here: https:\/\/stability.ai\/news\/stable-diffusion-3-research-paper<\/p>","protected":false},"excerpt":{"rendered":"<p>Stability AI recently released a technical report on its strongest image generation model, Stable Diffusion3 (SD3), revealing more details about SD3. According to Stability AI, SD3 exceeds all current open-source and commercial models in layout quality, aesthetic quality and hint interpretation, and is currently the strongest image generation model. The main points of the technical report are as follows: According to the human preference assessment, SD3 has a better quality of layout and understanding of tips than the current state-of-the-art text-generated image system, such as DALL E3, Midjourney v6 and Ideogram v1. The report suggests a new multi-modular proliferation<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[388,1323],"collection":[],"class_list":["post-4968","post","type-post","status-publish","format-standard","hentry","category-news","tag-stability","tag-stable-diffusion3"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/4968","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=4968"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/4968\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=4968"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=4968"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=4968"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=4968"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}