{"id":29577,"date":"2025-02-25T11:33:39","date_gmt":"2025-02-25T03:33:39","guid":{"rendered":"https:\/\/www.1ai.net\/?p=29577"},"modified":"2025-02-25T11:33:39","modified_gmt":"2025-02-25T03:33:39","slug":"deepseek-%e5%8f%91%e5%b8%83%e5%bc%80%e6%ba%90%e9%a1%b9%e7%9b%ae-flashmla","status":"publish","type":"post","link":"https:\/\/www.1ai.net\/en\/29577.html","title":{"rendered":"DeepSeek Releases Open Source Project FlashMLA"},"content":{"rendered":"<p>February 24<a href=\"https:\/\/www.1ai.net\/en\/tag\/deepseek\" title=\"[View articles tagged with [DeepSeek]]\" target=\"_blank\" >DeepSeek<\/a> Open Source Week First Project <a href=\"https:\/\/www.1ai.net\/en\/tag\/flashmla\" title=\"_Other Organiser\" target=\"_blank\" >FlashMLA<\/a> Officially released.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-29578\" title=\"5c533f19j00ss7z7b0073d000tg00uwp\" src=\"https:\/\/www.1ai.net\/wp-content\/uploads\/2025\/02\/5c533f19j00ss7z7b0073d000tg00uwp.jpg\" alt=\"5c533f19j00ss7z7b0073d000tg00uwp\" width=\"1060\" height=\"1112\" \/><\/p>\n<p>Officially, FlashMLA is inspired by FlashAttention 2&amp;3 and the cutlass project. Specifically, FlashMLA is an efficient MLA (Multi-Head Latent Attention) decoding kernel optimized for Hopper GPUs with support for variable-length sequence processing, and is now in production use.<\/p>\n<p>Optimized for multi-layer attention mechanisms, FlashMLA accelerates the decoding process of LLM to improve model responsiveness and throughput, which is especially important for real-time generative tasks (e.g., chatbots, text generation, etc.). In short, FlashMLA is an optimization that makes LLM models faster and more efficient on H800, especially for high-performance AI tasks.<\/p>\n<p>Currently, the released version of FlashMLA supports the features of \"BF16\" and \"Paged KV Cache, Block Size 64\", which enables 3,000 GB\/s of memory bandwidth and 580 TFLOPS of compute performance on the H800.<\/p>\n<p>FlashMLA is now available on GitHub, and within 6 hours of its launch, it had more than 5,000 Star Favors and 188 Forks.<\/p>\n<p>In addition, an investor focusing on AI hardware research said through Sina Technology that the FlashMLA released by DeepSeek is a major boon for domestic GPUs (graphics cards).<\/p>\n<p>The investors analyzed that the previous domestic GPU performance is weak, now we can use the optimization ideas and methodology provided by FlashMLA to try to make the domestic GPU to greatly improve the performance, even if the architecture is different, the reasoning performance of the domestic graphics card will be a natural thing to improve later.<\/p>","protected":false},"excerpt":{"rendered":"<p>On 24 February, the first project, FlashMLA, was officially launched in DeepSeek Open Source Week. According to official accounts, FlashMLA was inspired by FlashAttention 2&amp;3 and Cutlass projects. Specifically, FlashMLA is an efficient MLA decoded kernel for Hopper GPU optimization to support long sequence processing and is now in production. FlashMLA has been optimized for multi-layered attention mechanisms that can speed up the decodering process for LLM, thereby increasing the response speed and throughput of the model, which can be generated in real time<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[146],"tags":[3606,5798,2284],"collection":[],"class_list":["post-29577","post","type-post","status-publish","format-standard","hentry","category-news","tag-deepseek","tag-flashmla","tag-2284"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/29577","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/comments?post=29577"}],"version-history":[{"count":0,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/posts\/29577\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/media?parent=29577"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/categories?post=29577"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/tags?post=29577"},{"taxonomy":"collection","embeddable":true,"href":"https:\/\/www.1ai.net\/en\/wp-json\/wp\/v2\/collection?post=29577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}