November 29th news, yesterday afternoonAliTongyi Large ModelAccording to official public call, recently releasedBiograph Model Z-Image Up on the line, the Hugging Face Trend went to the top, and the first day the model was launched was downloaded with 500,000 downloads。

According to the official article, Z-Image achieved by the size of the 6B parameterApproximate to a larger magnitude modelPhoto level realism. Whether it is skin sense, filament details, or natural light and material texture, it is finely reduced, and the image and atmosphere are both aesthetically expressed。
In addition, Z-Image-Turbo is able to fine-tune the English- and Chinese-language combination of text, even in difficult contexts such as small fonts, complex layouts or poster designs, to keep the text clear and the layout natural, without sacrificing the real sense of the human face and the overall beauty of the image, and to match the current leading closed-source model。
Meanwhile, Z-Image hasA wide appreciation of the real worldThis enables the accurate generation of well-known landmarks (e.g. Eiffel Tower, Home Palace), well-known figures and particular cultural elements (e.g. spring window, English telephone booth) to ensure that the picture is realistic in detail, scale and context。
Through Prompt Enhancer, Z-Image is able to handle complex tasks such as the logic of "chick rabbit coop" and the visualization of the ancient poem "little bridge" so that AI is not just "drawing" but "preventing."。
Z-Image-Edit can perform complex editing instructions with precision, such as "Let the person smile + turn around + Change the background to cherry + Add Chinese slogans " , and maintain a high degree of consistency in identity, light, style, and avoid the error and distortion of common editing models in a substantial modification。
This time, it released two specialized models。
Z-Image-Turbo: As a distilled and optimized version of Z-Image, only 8 steps of reasoning are required to generate high-quality images that perform well in photo-level authenticity and in the translation of bilingual Chinese and English texts. Whether it is a day-to-day creation, poster design, or quick prototype generation, it can run well on the 16GB Visible Visible Card to "get what you want."。
Z-Image-Edit: an editorial-specific model based on Z-Image's ongoing training that is capable of responding accurately to complex and complex commands, while modifying many elements, such as expression, attitude, background, text and so forth, and maintaining consistency in identity, light coordination, style and style in the event of a significant change, in order to truly achieve " Logically Interpretable Smart Editor"。
Z-Image, at the data level, has constructed an efficient data ecology with data illustrations, cross-modular vector engines, world knowledge mapping and active labelling systems, replacing “multi-data” with “right-to-data” to enhance training efficiency from the source; at the architecture level: innovation uses a single-flow spread Transformer (S3-DiT), aligning text, image submersible variables and time-step conditions into single-series inputs, achieving a long-term integration across-module and a significant increase in the utilization of parameters; at the training level: achieving real-time, high-quality generation through a three-stage progressive strategy (low-resolution pre-training, full-task paner training – RLHF alignment) and a systematic infusion of world knowledge and precision in favour of humanity; at the reasoning level: Z-Image-Turbo, based on the above-mentioned basis, introducing Z-Image-Turbo, through a fine-stilled retracting and enhanced learning normalization, achieving real-time, high-quality and universal production under only eight-step reasoning。
Open-source links to the platforms are as follows:
- GitHub:https://github.com/Tongyi-MAI/Z-Image
- Hugging Face:https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
- ModelScope:https://www.modelscope.cn/models/Tongyi-MAI/Z-Image-Turbo