October 17th news, September this yearHuaweiIT'S AN ANCIENT 718BNo data. Focus on thinkingThe philosophy of training, in the SuperCLUE listOpen Source ModelThird, become the focus of industry。
China officially announced yesterday that open Pangu-Ultra-MoE-718B-V1.1 is officially open on the GitCode platform, with full disclosure of model weights and technical details。
- Hardware requirements: Atlas 800T A2 (64GB, > = 32 carats) to support naked machines or Docker deployment。
- feature function: toggle slow thinking mode by / /no_think tag to support multiwheeling tools。

For the official presentation, China states that openPangu-Ultra-MoE-718B-V1.1 is a large-scale, mixed-expert (MoE) language model based on NPU training, with a total parameter size of 718B and a activated parameter of 39B. The model combines “fast thinking” and “low thinking” capabilities under the same structure to achieve more efficient and intelligent reasoning and decision-making。
In September, the latest SuperCLUE In the list, openPangu-718B is the third most stable open source model with excellent performance on the six core dimensions of mathematical reasoning, scientific reasoning, code generation, etc. Of particular concernThe hallucinogenic control score is up to 81.28Even more than some of the closed-source giants highlight their technological advantages in output reliability。
V1.1 has significantly increased the ability of the Agent tool to call, has further reduced the hallucinogenic rate and has improved the combined performance and stability of the model as compared to the previous version of OpenPangu-Ultra-MoE-718B-V1.0。
Wigand has adopted advanced industry Multi-head Late Attention (MLA), Multi-Token Production (MTP) as well asHigh-sortium-mixed expert structureOn this basis, innovative designs have been introduced to achieve better performance and training efficiency:
- Decth-Scaled Sandwich-Norm and TinyInit:The stability and speed of model training is significantly enhanced by improving the tiered structure and the initialization of parameters。
- EP-Group-based load equilibrium policy:Optimizing the load balance loss function, which effectively enhances the balance in the distribution of the experts ' routing and enhances their specialization and synergy。
THE V1.1 VERSION OF THIS OPEN SOURCE HAS SIGNIFICANTLY INCREASED IN SEVERAL CRITICAL DIMENSIONS:
- Integrated capacity optimization:(a) In the MMLLU-Pro, GPQA and other high-probability assessments, slow-thinking double-models outperformed V1.0
- There's been a significant decrease in the hallucinogenic rate:(B) THE HALLUCINOGENIC RATE HAS BEEN REDUCED FROM 10.111 TP3T IN V1.0 TO 3.851 TP3T (QUICK THINKING MODE) THROUGH THE "CRITICAL INTERNALIZATION" MECHANISM
- Tool call capacity enhancement:Upgrade the ToolaCE framework to be visible in multi-tool synergetic missions such as Tau-Bench
- Initial Int8 quantitative version:VISIBLE OCCUPANCY WAS REDUCED BY ABOUT HALF, THROUGHPUT WAS UP 20%, LOSS OF PRECISION WAS LESS THAN 1%。
1AI Attach official address:
- Model Address:
https://ai.gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1 - Int8 quantitative version:
https://gitcode.com/ascend-tribe/openPangu-Ultra-MoE-718B-V1.1-Int8