{"id":14274,"date":"2024-06-28T10:03:16","date_gmt":"2024-06-28T02:03:16","guid":{"rendered":"https:\/\/www.1ai.net\/?p=14274"},"modified":"2024-06-28T10:03:25","modified_gmt":"2024-06-28T02:03:25","slug":"%e8%a7%86%e7%95%8c%e4%b8%80%e7%b2%9fyisu%ef%bc%9a%e4%b8%ad%e5%9b%bd%e9%a6%96%e4%b8%aa%e8%b6%85%e6%97%b6%e9%95%bfsora%e7%ba%a7%e8%a7%86%e9%a2%91%e7%94%9f%e6%88%90%e5%a4%a7%e6%a8%a1%e5%9e%8b","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/14274.html","title":{"rendered":"Yisu: China&#039;s first large model for generating ultra-long Sora-level videos"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-14275\" title=\"1-2406130ZG64a\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2024\/06\/1-2406130ZG64a.jpg\" alt=\"1-2406130ZG64a\" width=\"1200\" height=\"647\" \/><\/p>\n<p><a href=\"https:\/\/www.1ai.net\/en\/tag\/%e8%a7%86%e7%95%8c%e4%b8%80%e7%b2%9f\" title=\"[Sees articles with [the] label]\" target=\"_blank\" >A glimpse into the world<\/a> YiSu is a video generation system developed by Beijing Jijiashijie Technology Co., Ltd. and the Department of Automation of Tsinghua University.<a href=\"https:\/\/www.1ai.net\/en\/tag\/%e5%a4%a7%e6%a8%a1%e5%9e%8b\" title=\"[View articles tagged with [large models]]\" target=\"_blank\" >Large Model<\/a>This model can generate videos longer than 1 minute and has advantages such as large movements and strong expressiveness. In addition, the YiSu model is cheaper and faster, making it suitable for large-scale product applications.<a href=\"https:\/\/www.1ai.net\/en\/tag\/yisu\" title=\"_Other Organiser\" target=\"_blank\" >Yisu<\/a>It is not just a video generation model, but also an important step towards a world model. The world model is crucial for general intelligence in the physical world such as autonomous driving general robots, and plays a key role in data generation, closed-loop simulation, and end-to-end solutions. YiSu demonstrated the same architecture based on video generation, and the effect of using it for autonomous driving and robot scene world models.<\/p>\n<h2><span id=\"lwptoc2\"><strong>YiSu Function<\/strong><\/span><\/h2>\n<ol>\n<li>Multimodal fusion capability: The Yisu model is not limited to processing single text or image data, it also has the ability of multimodal fusion. This means that the model can simultaneously understand and generate video content containing multiple information such as text, images, audio, etc. This multimodal fusion capability makes the Yisu model more widely applicable in the field of video generation.<\/li>\n<li>Efficient training and reasoning: By optimizing algorithms and architectures, the Yisu model has achieved significant improvements in both training and reasoning speed. This enables the model to generate video content more quickly and improves the efficiency of video generation. At the same time, the efficient training process also enables the Yisu model to adapt to new data and scenarios more quickly.<\/li>\n<li>Terminal-side operation capability: The Yisu model has the ability to run directly on the terminal device without relying on cloud support. This allows users to quickly generate video content on local devices without waiting for cloud processing time, improving the convenience and flexibility of video generation.<\/li>\n<li>High cost-effectiveness: Compared with other video generation solutions, the Yisu model is lower in cost, faster in speed, and extremely cost-effective. This makes the Yisu model more suitable for various application scenarios, especially those that are cost-sensitive or require fast generation of video content.<\/li>\n<li>Continuous iteration and optimization: The Yisu team is committed to continuous iteration and optimization of the model. They plan to grow and evolve rapidly at the rate of one small version per week and one large version per month. In the future, the Yisu model will achieve significant improvements in video duration, controllability, reasoning speed, operating cost, and understanding of the physical world, providing users with better video generation services.<\/li>\n<li>Ultra-long duration: Yisu natively supports 16-second video generation and has the ability to expand to more than 1 minute, breaking the duration limitation of traditional video generation models.<\/li>\n<li>High performance: The model has a large range of motion, strong expressiveness, and can understand the laws of the physical world, making the generated videos more realistic, natural, and dynamic.<\/li>\n<\/ol>\n<p><strong>Technical features:<\/strong><\/p>\n<p>Self-developed architecture: Yisu adopts the video generation large model technology independently developed by the team, combining the advantages of LLM and diffusion model to achieve efficient video generation.<\/p>\n<p>Multimodal fusion: The model is optimized for processing multimodal data and can better understand and generate video content containing multiple information such as text, images, audio, etc.<\/p>\n<p>Efficient training and inference: By optimizing algorithms and architecture, Yisu has achieved significant improvements in both training and inference speeds, improving the efficiency of video generation.<\/p>\n<p>Official website address:<a href=\"https:\/\/world-dreamer.github.io\/\">https:\/\/world-dreamer.github.io\/\u00a0<\/a><\/p>\n<p>&nbsp;<\/p>","protected":false},"excerpt":{"rendered":"<p>Vision YiSu YiSu is a large model for video generation developed by Beijing Excellent Vision Technology Co. Ltd. in conjunction with the Department of Automation at Tsinghua University. This model can generate videos of more than 1 minute and has the advantages of super large motion and super expressive power. In addition, the YiSu model is cheaper and faster and is suitable for large-scale product applications. In addition, the Vision YiSu Yisu is more than just a video generation model; it is an important step towards the World Model. The world model is crucial for generalized intelligence in the physical world such as self-driving general purpose robots, and holds a key role in data generation, closed-loop simulation, and end-to-end solutions. Vision YiSu demonstrates the effect of the same architecture based on video generation for world modeling of autonomous driving and robotics scenarios. Vision YiSu YiS<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[138,140],"tags":[956,981,3263,216,3262],"collection":[],"class_list":{"0":"post-14274","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"hentry","6":"category-product","7":"category-qita","8":"tag-ai","10":"tag-yisu","11":"tag-216","12":"tag-3262"},"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/14274","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=14274"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/14274\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=14274"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=14274"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=14274"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=14274"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}