Moore Threads supports DeepSeek Open Source Week "Family Bucket"

March 2 News.DeepSeek The official closing of Open Source Week.Moore ThreadIntelligent Technology (Beijing) Co., Ltd. announced in a post yesterday evening that it has successfully realized the DeepSeek in a short period of time. Full support for various open source projectsIt covers FlashMLA, DeepEP, DeepGEMM, DualPipe, and the Fire-Flyer File System (3FS).

Moore Threads Supports DeepSeek Open Source Week "Family Bucket"

1AI with Moore's Thread Support The DeepSeek Open Source Week "bucket" of code is assembled below:

FlashMLA

FlashMLA is an open source repository of efficient MLA (Multi-Head Latent Attention) inference kernels designed to accelerate the computation of MLA mechanisms.Particularly suitable for DeepSeek family of models(e.g., DeepSeek-V2, V3, and R1).

Based on the new MUSA Compute Capability 3.1 computational architecture, Moore Threads provides native FP8 computational power and upgraded the high-performance linear algebra template library, MUTLASS, to quickly support FlashMLA. With MUTLASS 0.2.0, Moore Threads releases the open-source repository, MT-FlashMLA, which provides a quick deployment of the DeepSeek FlashMLA.

MT-FlashMLA Open Source Address:

https://github.com/MooreThreads/MT-flashMLA

MUTLASS FlashAttention3 Address:

https://github.com/MooreThreads/mutlass/tree/main/experimental/mp31_flash_attention_fwd

DeepEP

MT-DeepEP Open Source Address:

DeepEP is an open-source EP (expert parallelism) communication library for MoE (hybrid expert) model training and inference, which is mainly suitable for large model training, especially for cluster training that requires EP. It significantly improves the training efficiency by optimizing the usage rate of communication channels. Moore Threads is based on MUSA Compute Capability 3.1 Full-Featured GPUs, the first time adapted DeepEP to support the following features:

Efficient and optimized All-to-All communication with dispatch & combine support
Supports MTLink + GPU (MUSA Compute Capability 3.1) intra-node communication
High throughput computational core for training and inference pre-population phases
A low-latency computational core for the inference decoding phase
Native support for FP8 data distribution
Flexible control of GPU resources for efficient overlap of compute and communication
https://github.com/MooreThreads/MT-DeepEP

DeepGEMM

DeepGEMM is an FP8 GEMM library supporting dense matrix and mixed-expertise (MoE) matrix multiplication to power V3 / R1 training and inference. This open source repository is based on a library of C++ templates for high-performance generalized matrix multiplication (GEMM). Moore Threads is based on the MUTLASS optimized implementation on the new GPU architecture of the FP8 Matrix multiplicationThe corresponding functions of DeepGEMM are supported.

MUTLASS FP8 GEMM Address:

https://github.com/MooreThreads/mutlass/tree/main/examples/02_mp31_fp8_gemm_with_collective_builder
https://github.com/MooreThreads/mutlass/tree/main/examples/03_mp31_fp8_scaling_gemm

DualPipe

DualPipe is a bi-directional pipelined parallel algorithm proposed by DeepSeek-V3, which significantly reduces "pipeline bubbles" (device idle waits) by completely overlapping the computation and communication between the forward and backward computation phases. Compared with traditional pipelined parallelism, DualPipe adopts a bi-directional data flow design so that data is processed from both ends, which significantly improves resource utilization and training efficiency.

Moore Threads supports the DualPipe algorithm with the full compatibility of the deep learning framework Torch-MUSA (open source) and the MUSA software stack. MT-DualPipe has full access to Moore Threads. MT-Megatron frameworkand MT-TransformerEngine framework (soon to be open source)The DeepSeek V3 training process is a new way to implement the DeepSeek V3 training process.full recapitulation.

MT-DualPipe Open Source Address:

https://github.com/MooreThreads/MT-DualPipe

Torch-MUSA open source address:

https://github.com/MooreThreads/Torch_MUSA

3FS

The Fire-Flyer File System (3FS), a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks to maximize the bandwidth performance of SSDs, serves as a key underpinning for the challenges of AI training and inference workloads during training and inference in V3 and R1.

Moore Threads quickly built 3FS, a high-performance distributed file system, in a single day, and efficiently developed the storage plugin.Successfully realized seamless integration with Quam ClusterIt provides full-stack storage acceleration solutions for AI training, AI inference, scientific computing, and other scenarios.

3FS CSI Driver Address:

https://github.com/MooreThreads/csi-driver-3fs

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Moore Threads Supports DeepSeek Open Source Week "Family Bucket"

Wuhan, Hubei, the first case of "AI generated map was infringed" copyright disputes, the defendant company needs to pay compensation of 4,000 yuan

The Dark Side of AI: Training Introduces Unsafe Code, and Advocates Human Domination

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Wuhan, Hubei, the first case of "AI generated map was infringed" copyright disputes, the defendant company needs to pay compensation of 4,000 yuan

The Dark Side of AI: Training Introduces Unsafe Code, and Advocates Human Domination

Big Models DeepSeek: No one authorized to participate in institutional investor exchanges, online rumors of exchanges are not true

Zhu Xiaohu called DeepSeek a victory for tech idealists: focusing on technology and innovation is far better than buying traffic and PR.

Former Intel CEO Kissinger endorses DeepSeek: will drive AI ubiquity

SimilarWeb: DeepSeek's Official Website Has Surpassed Google's Gemini in Daily Global Visits

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow