{"id":28425,"date":"2025-02-10T21:29:39","date_gmt":"2025-02-10T13:29:39","guid":{"rendered":"https:\/\/www.1ai.net\/?p=28425"},"modified":"2025-02-10T21:29:39","modified_gmt":"2025-02-10T13:29:39","slug":"%e8%b1%86%e5%8c%85%e5%bc%80%e6%ba%90%e8%a7%86%e9%a2%91%e7%94%9f%e6%88%90%e6%a8%a1%e5%9e%8b-videoworld%ef%bc%9a%e9%a6%96%e5%88%9b%e5%85%8d%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b%e4%be%9d%e8%b5%96%e8%ae%a4","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/28425.html","title":{"rendered":"Beanbag Open Source Video Generation Model VideoWorld: First Language-Free Model Dependent Cognitive World"},"content":{"rendered":"<p>February 10th.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%b1%86%e5%8c%85\" title=\"[View articles tagged with [beanbag]]\" target=\"_blank\" >Bean curd<\/a>VideoWorld\", an experimental video generation model jointly developed by Big Model team, Beijing Jiaotong University and University of Science and Technology of China, is open-sourced today. Unlike mainstream multimodal models such as Sora, DALL-E, and Midjourney, VideoWorld realizes for the first time in the industry that you don't need to rely on a language model to know the world.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-28426\" title=\"3a517082j00srgyrs00bpd000v900fzp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/02\/3a517082j00srgyrs00bpd000v900fzp.jpg\" alt=\"3a517082j00srgyrs00bpd000v900fzp\" width=\"1125\" height=\"575\" \/><\/p>\n<p>It is stated that most of the existing models rely on language or labeled data to learn knowledge, and rarely involve the learning of purely visual signals. However, language does not capture all knowledge in the real world. For example, complex tasks such as origami and bow tie are difficult to express clearly through language. VideoWorld, on the other hand, removes the language model and realizes unified execution of comprehension and reasoning tasks.<\/p>\n<p>At the same time, it is based on a potentially dynamic model that can be<strong>Efficient compression of video frame-to-frame variation information<\/strong>Significant improvements in the efficiency and effectiveness of knowledge learning. Without relying on any mechanism of enhanced learning search or reward function, VideoWorld has reached the level of professional section 5 9x9 and is able to perform robotic missions in a variety of environments\u3002<\/p>\n<p data-vmark=\"c348\"><span class=\"referenceTitle\">1AI Attach the relevant address below:<\/span><\/p>\n<ul class=\"custom_reference list-paddingleft-1\">\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"7f42\"><strong>Link to paper:<\/strong><span class=\"link-text-start-with-http\">https:\/\/arxiv.org\/abs\/2501.09781<\/span><\/p>\n<\/li>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"febd\"><strong>Code Link:<\/strong><span class=\"link-text-start-with-http\">https:\/\/github.com\/bytedance\/VideoWorld<\/span><\/p>\n<\/li>\n<li class=\"list-undefined list-reference-paddingleft\">\n<p data-vmark=\"5e02\"><strong>Project home page:<\/strong><span class=\"link-text-start-with-http\">https:\/\/maverickren.github.io\/VideoWorld.github.io<\/span><\/p>\n<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>On February 10th, the Bean Pack team joined forces with the Open Source of VideoWorld, an experimental video generation model developed by Beijing University of Transport and China University of Science and Technology. Unlike mainstream multi-modular models such as Sora, DALL-E and Midjourney, VideoWorld is the first time that industry can realize the world without relying on language models. Most of the existing models are described as reliant on language or label data to learn knowledge, with little involvement in purely visual signal learning. However, language does not capture all knowledge in the real world. Complex tasks such as paper origami and tie-ups are difficult to articulate through language. And VidioWorld removes the language model and achieves a unified task of understanding and reasoning. In the meantime<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[5677,2248],"collection":[],"class_list":["post-28425","post","type-post","status-publish","format-standard","hentry","category-news","tag-5677","tag-2248"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/28425","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=28425"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/28425\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=28425"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=28425"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=28425"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=28425"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}