October 14, 2012 - The Smart Spectrum technology team today announced thatOpen SourceWenshengtu Model CogView3 and CogView3-Plus-3B , the capabilities of the model series are now live"Zhipu Qingyan"App.

CogView3 is described as a text2img model based on cascade diffusion, which consists of the following three stages:
- Stage 1: Generation of 512×512 low resolution images using standard diffusion process.
- Phase 2: Performs 2-fold super-resolution generation using a relay diffusion process to generate a 1024×1024 image from a 512×512 input.
- Phase 3: The generated results are again iterated based on relay diffusion to generate 2048×2048 high resolution images.

Officially, CogView3 outperforms the current state-of-the-art open-source text-to-image diffusion model SDXL by 77.01 TP3T in manual evaluation, while requiring only about 1/10 of the inference time of SDXL.
The CogView3-Plus model, on the other hand, introduces the latest DiT framework on top of CogView3 (ECCV'24) in order to realize a further improvement of the overall performance. It is reported that it uses Zero-SNR diffuse noise scheduling and introduces theJoint Text-Image Attention MechanismCogView-3Plus uses a VAE with a potential dimension of 16. It effectively reduces training and inference costs while maintaining the basic capabilities of the model compared to the commonly used MMDiT structure.CogView-3Plus uses a VAE with a potential dimension of 16.
The attached address is below:
Open source repository address:
-
https://github.com/THUDM/CogView3
Plus open source model repository:
-
https://huggingface.co/THUDM/CogView3-Plus-3B
-
https://modelscope.cn/models/ZhipuAI/CogView3-Plus-3B