VideoWorld: First-ever language model depends on the cognitive world

Beanbag Open Source Video Generation Model VideoWorld: First Language-Free Model Dependent Cognitive World

February 10th.Bean curdVideoWorld", an experimental video generation model jointly developed by Big Model team, Beijing Jiaotong University and University of Science and Technology of China, is open-sourced today. Unlike mainstream multimodal models such as Sora, DALL-E, and Midjourney, VideoWorld realizes for the first time in the industry that you don't need to rely on a language model to know the world.

Beanbag Open Source Video Generation Model VideoWorld: First Language-Free Model Dependent Cognitive World

It is stated that most of the existing models rely on language or labeled data to learn knowledge, and rarely involve the learning of purely visual signals. However, language does not capture all knowledge in the real world. For example, complex tasks such as origami and bow tie are difficult to express clearly through language. VideoWorld, on the other hand, removes the language model and realizes unified execution of comprehension and reasoning tasks.

At the same time, it is based on a potentially dynamic model that can beEfficient compression of video frame-to-frame variation informationSignificant improvements in the efficiency and effectiveness of knowledge learning. Without relying on any mechanism of enhanced learning search or reward function, VideoWorld has reached the level of professional section 5 9x9 and is able to perform robotic missions in a variety of environments。

1AI Attach the relevant address below:

Link to paper:https://arxiv.org/abs/2501.09781
Code Link:https://github.com/bytedance/VideoWorld
Project home page:https://maverickren.github.io/VideoWorld.github.io

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Beanbag Open Source Video Generation Model VideoWorld: First Language-Free Model Dependent Cognitive World

Everything can access DeepSeek, 44 domestic platforms accessing R1 with super-detailed inventory

The news said that Meituan "All in AI", Wang Xing, Wang Puzhong both valued

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Everything can access DeepSeek, 44 domestic platforms accessing R1 with super-detailed inventory

The news said that Meituan "All in AI", Wang Xing, Wang Puzhong both valued

ByteDance releases Doubao · Tusheng graph model. The average daily usage of Doubao large model tokens exceeds 500 billion

Byte Jump's foray into video AI Beanbag's big model for video generation released

Li Liang, VP of Jitterbug: Hope to promote AI technology popularization and application development with lower cost

Beanbag Big Model Releases Technical Advances in Various Areas, First Disclosure of 3 Million Long Text Capabilities

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow