Google DeepMind, in collaboration with researchers from several universities, has developed an AI system called CAT4D, which can transform ordinary videos into dynamic 3D scenes, lowering the threshold of 3D content creation and bringing new possibilities to multiple industries. It utilizes a diffusion model to convert single-view videos into multi-view views to build 3D scenes that can be viewed from multiple angles, simplifying the previous process of recording with multiple cameras simultaneously. Due to the lack of existing data during training, the team mixes real and computer-generated content as training data and creates images by learning from the diffusion model. At this stage, the 3D scene generated is shorter than the original material, but the imaging quality is better than similar systems, and has a wide range of applications in game development, movie production, augmented reality and other fields, and related practitioners can integrate it into their workflow.
Paper address: https://arxiv.org/pdf/2411.18613
Open source address: https://cat-4d.github.io/?utm_source=ai-bot.cn
