The news of September 10thTencentYesterdayMixed Image Model 2.1 Up NewOpen Source, SUPPORTS ORIGINAL 2K AND CHINESE-ENGLISH RAW INPUT。

It's also synchronized with the source."PromptEnhancer Text Rewrite Model”Inputs “Downing a lovely cat”, which automatically completes “Orange cat on a plaid table with cookies on his paws and a water-colored wind”; supports a two-way conversion in Chinese and English, and in Chinese, “Dream's starboard cake” can also be presented with precision and avoid “mixed expression”。
Mixed Image Model 2.1 Supported Up 1k tokens complex semantic super-long, prompt, which supports multi-subject descriptions and precision generation。
A hybrid image model 2.1 has a more stable control over the rendering of text and scene details in the imageReduced common text errors and understanding deviations.
The hybrid image model 2.1 also supports the generation of a variety of styles, such as real senses, comics and glue。
A hybrid image model 2.1 and the following bright spots:
- Double-channel text encoder, using both universal and word encoders:
- Visual-linguistic multi-module encoder to better understand the needs for scenario descriptions, person moves and details。
- A multilingual ByT5 text encoder that enhances the text rendering capability of the model。
- Caption:
- structured caption provides multi-level semantic information that significantly enhances the model ' s ability to respond to complex semantics。
- Innovative introduction of OCR angent and IP RAG to complete the universal VLM Captioner in intensive text and world knowledge description panels。
- Two-stage model structure:
- TEXTILE MODEL: SINGLE-TWIN NETWORK STRUCTURE, 17B MODEL PARAMETERS。
- Refiner Model: The introduction of a condition-generated structure similar to that of a drawing can significantly reduce malformations while further improving the image ' s quality and clarity。
- Two-stage enhanced post-training: SFT and RL after two-stage training, self-study of Reward Distribution Enhancement Enhanced Learning algorithm, innovative introduction of high-quality images as a sample of chosen, enhanced stability
- HIGH COMPRESSION RATE VAE, SIGNIFICANTLY INCREASING THE EFFICIENCY OF TRAINING REASONING:
- 32 Double compression rate VAE:dit model token input is greatly reduced, alignment VAE and dinov2 feature space is less difficult to train. 2K image generation is time-consuming and the same model 1K pattern is time-consuming。
- multi-resolution repa loss: for acceleration model condensation
- meaningflow acceleration of reasoning: first runover meansflow on an industrial model with a number of reasoning steps from 100 – > 8 steps, significantly increasing distillation effects
- Mixed Text Rewrite Model (PromptEnhancer): The first systematic industrial rewrite model, through SFT and GRPO training, has led to a significant improvement in the semantics of the images generated by the text, along with the presentation of the AignEvaluator ' s incentive model, which covers six broad categories of 24 fine particlescales, and PromptEnhancer supports the simultaneous rewriting of English。
2.1 OPEN SOURCE ADDRESSES AS FOLLOWS:
https://github.com/Tencent-Hunyuan/HunyuanImage-2.1