Huawei to release breakthrough in AI inference soon: reduces dependence on HBM, boosts performance of large domestic models, sources say

Aug. 10, 2011 - According to the Daily KotakuHuaweiwill be held on August 12 at 2025 Financial AI reasoningAt the Application Landing and Development Forum, the breakthrough technical achievements in the field of AI reasoning were released. It was revealed that this achievement may be able to reduce China's AI reasoning on the HBM(high-bandwidth memory) technology dependence, upgrading domestic AI Big Modelinference performance, a key part of improving China's AI inference ecosystem.

Huawei to release breakthrough in AI inference soon: reduces dependence on HBM, boosts performance of large domestic models, sources say

1AI notes that Huawei's technological breakthroughs in the field of AI inference already have precedents.2025 In March, Peking University and Huawei released the DeepSeek full-stack open-source inference solution, which is based on Peking University's self-researched SCOW computational platform system and the Hesse scheduling system, and integrates community open-source components, such as DeepSeek, openEuler, MindSpore, and vLLM/RAY, to realize efficient inference on Huawei's Rise. The solution is based on NU's own SCOW computing platform and Hesse scheduling system, and integrates DeepSeek, openEuler, MindSpore, and vLLM/RAY.

In terms of performance, Huawei Rise has realized a number of breakthroughs. For example, when CloudMatrix 384 supernodes were deployed with DeepSeek V3/R1, the single-card Decode throughput exceeded 1920 Tokens/s under the 50ms latency constraint, and the single-card throughput of Atlas 800I A2 inference server reached 808 Tokens/s under the 100ms latency constraint.

The cooperation between KU Xunfei and Huawei has also achieved remarkable results, with both parties taking the lead in realizing large-scale cross-node expert parallel cluster reasoning for MoE models on domestic arithmetic power, which improves reasoning throughput by 3.2 times and reduces end-to-end latency by 50%.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

More than 500 humanoid robots from all over the world are going to the "Ice Ribbon", the first World Humanoid Robot Games will start on August 14th.

2025-8-9 14:07:32

Information

AliCloud Tongyi Qwen Code Announces 2000 Free Runs Per Day: One Line Installation for Mainland China Users Only

2025-8-10 13:03:43

Search