{"id":18776,"date":"2024-08-28T09:43:09","date_gmt":"2024-08-28T01:43:09","guid":{"rendered":"https:\/\/www.1ai.net\/?p=18776"},"modified":"2024-08-28T09:43:09","modified_gmt":"2024-08-28T01:43:09","slug":"%e6%99%ba%e8%b0%b1ai-%e5%bc%80%e6%ba%90-cogvideox-5b-%e8%a7%86%e9%a2%91%e7%94%9f%e6%88%90%e6%a8%a1%e5%9e%8b%ef%bc%8crtx-3060-%e6%98%be%e5%8d%a1%e5%8f%af%e8%bf%90%e8%a1%8c","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/18776.html","title":{"rendered":"Zhipu AI open-sources CogVideoX-5B video generation model, which can be run on RTX 3060 graphics card"},"content":{"rendered":"<p>News on August 28,<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e6%99%ba%e8%b0%b1ai\" title=\"[SEES ARTICLES WITH [INTELLIGENCE AI] LABELS]\" target=\"_blank\" >Zhipu AI<\/a> <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%bc%80%e6%ba%90\" title=\"[View articles tagged with [open source]]\" target=\"_blank\" >Open Source<\/a>CogVideoX-5B <a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%a7%86%e9%a2%91%e7%94%9f%e6%88%90%e6%a8%a1%e5%9e%8b\" title=\"_Other Organiser\" target=\"_blank\" >Video Generation Model<\/a>Compared with the previously open source CogVideoX-2B, the official said that its video generation quality is higher and the visual effects are better.<\/p>\n<p>Official statement<strong>The model&#039;s reasoning performance has been greatly optimized, and the reasoning threshold has been greatly reduced.<\/strong>, you can run CogVideoX-2B on early graphics cards such as GTX 1080Ti, and run the CogVideoX-5B model on desktop &quot;dessert cards&quot; such as RTX 3060.<\/p>\n<p>CogVideoX is a large-scale DiT (diffusion transformer) model for text-to-video tasks. It mainly uses the following techniques:<\/p>\n<ul>\n<li>3D causal VAE: achieves efficient video reconstruction by compressing video data into latent space and decoding in the temporal dimension.<\/li>\n<li>Expert Transformer: combines text embedding and video embedding, uses 3D-RoPE as position encoding, adopts expert adaptive layer to normalize the data of two modalities, and uses 3D full attention mechanism for spatiotemporal joint modeling.<\/li>\n<\/ul>\n<p>The detailed parameters of CogVideoX-5B and CogVideoX-2B are as follows:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-18777\" title=\"893dd86aj00siwnet0046d000ms00swm\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/08\/893dd86aj00siwnet0046d000ms00swm.jpg\" alt=\"893dd86aj00siwnet0046d000ms00swm\" width=\"820\" height=\"1040\" \/><\/p>\n<p>Attached related links:<\/p>\n<ul>\n<li>Code repository: https:\/\/github.com\/THUDM\/CogVideo<\/li>\n<li>Model download: https:\/\/huggingface.co\/THUDM\/CogVideoX-5b<\/li>\n<li>Paper link: https:\/\/arxiv.org\/pdf\/2408.06072<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>August 28th news, the wisdom of the spectrum AI open source CogVideoX-5B video generation model, compared to the previously open source CogVideoX-2B, the official said that its video generation quality is higher, the visual effect is better. Officially, the model's inference performance has been greatly optimized, and the inference threshold has been greatly reduced, allowing CogVideoX-2B to run on early graphics cards such as the GTX 1080Ti, and CogVideoX-5B to run on desktop \"dessert cards\" such as the RTX 3060. CogVideoX is a large-scale DiT (diffusion transformer) model for text-to-video tasks that utilizes the following techniques: 3D cau<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[148,146],"tags":[219,379,460],"collection":[],"class_list":["post-18776","post","type-post","status-publish","format-standard","hentry","category-headline","category-news","tag-219","tag-ai","tag-460"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18776","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=18776"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/18776\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=18776"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=18776"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=18776"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=18776"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}