Step1X-3D, an open source 3D macromodel, generates high-fidelity, controlled 3D content

May 14 News.Step StarOfficially released andOpen Source Step1X-3D, a 3D large model, is the latest achievement of Step1X in the direction of multimodality after image, video, voice, music, etc. The total number of parameters of Step1X-3D model reaches 4.8B (Geometry Module 1.3B, Texture Module 3.5B), which can generate high fidelity and controllable 3D content with solid data foundation and advanced With a solid data foundation and advanced 3D native architecture, Step1X-3D can generate high-fidelity and controllable 3D content. According to Step1X-3D, Step1X-3D is not only "good-looking", but also "useful" and "controllable", aiming to provide a powerful and reliable technology engine for 3D content creation. Step1X-3D is not only "good-looking" but also "useful" and "controllable", aiming to provide a powerful and reliable technology engine for 3D content creation.

Step1X-3D, an open source 3D macromodel, generates high-fidelity, controlled 3D content

StepStar announced a complete data cleaning strategy, data preprocessing strategy, and 800K high-quality 3D assets, 3D VAE, 3D geometry Diffusion, and texture Diffusion full-link training code open-source to help the 3D generation community grow.

Open source links and experience addresses:

GitHub: https://github.com/stepfun-ai/Step1X-3D

HuggingFace: https://huggingface.co/stepfun-ai/Step1X-3D

ModelScope: https://www.modelscope.cn/models/stepfun-ai/Step1X-3D

Tech Report: https://arxiv.org/pdf/2505.07747

1AI is accompanied by the following official core features and technical support:

Step1X-3D attempts to address the key challenges of 3D content generation with innovative practices in data, generation quality and controllability.

1. Data-driven and algorithmic co-optimization

Good data is the foundation of a good model, and Step1X-3D has established a library of 2 million high-quality, standardized training samples after rigorous screening and processing of more than 5 million raw data, effectively overcoming the bottleneck of data scarcity and quality variations in the industry.

In addition, Step1X-3D guarantees the accuracy of model learning and the efficiency of final generation from the source through enhanced mesh-SDF conversion technology and other methods, so that the success rate of watertight geometry conversion has been increased by 20%, and Step1X-3D also has strong generalization capability and detail capturing power.

2、3D native generation: clear structure, vivid details

Step1X-3D adopts an advanced 3D native two-stage architecture that decouples geometry and texture characterization to ensure that what is generated is not only a visual "skin", but also a reliable "skeleton" that can be used in downstream applications, effectively avoiding geometric distortions and guaranteeing accuracy, realism, and consistency of what is generated. Accuracy, realism and consistency are guaranteed.

  • Geometric shaping for greater precision

At the heart of geometry generation is an innovative hybrid VAE-DiT architecture that is deeply optimized for 3D features. The architecture generates the internal representation of the TSDF, ensuring that the resulting 3D model is structurally complete and free of broken surfaces, while capturing and reproducing the rich geometric details of the object through the introduction of techniques such as Sharp Edge Sampling.

  • More vivid texture details

Texture generation is deeply customized and optimized based on the powerful SD-XL model. Efficient synergy with the geometry module is achieved through precise guidance of geometric conditions (using normal and positional information) and potential spatial multi-view synchronization. This ensures that the generated textures are not only full of color and vivid texture, but also highly consistent across multiple views, fitting precisely to complex 3D surfaces and avoiding common distortions and seam imperfections.

Step1X-3D significantly improves the controllability and ease of use of 3D content generation. Critically, the overall VAE-Diffusion architecture is designed to be highly consistent with mainstream 2D generation models such as Stable Diffusion, allowing for the seamless introduction and application of proven 2D control techniques such as lightweight LoRA fine-tuning.

As a result, users can intuitively and finely tune a variety of attributes such as symmetry and surface details (e.g., sharpness, smoothness) of the generated 3D assets, allowing creations to more accurately match the user's intent.

  • Performance Evaluation

In order to objectively assess the actual effectiveness of Step1X-3D, we conducted a rigorous quantitative and qualitative evaluation of Step1X-3D through a self-built comprehensive test (containing 110 diverse test cases), as well as a comprehensive comparison with several mainstream models.

The results show that Step1X-3D performs well on a number of key dimensions in automated assessments.

In comparison with mainstream 3D models, especially in the CLIP-Score, a core metric that measures the semantic consistency of content and input, Step1X-3D achieves the highest score among all the current comparative models, providing the open-source community with a highly competitive 3D generation solution.

Online Demo: https://huggingface.co/spaces/stepfun-ai/Step1X-3D

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

TikTok Goes Live with Graphic Video AI Feature: Generate Creative Content with Movement, Atmosphere

2025-5-14 18:39:49

Information

Two California Law Firms Using AI to Generate False Materials Fined $31,000 by Judge

2025-5-14 18:43:27

Search