DeepSeek Releases Another Cost-Reducing Move: NSA Announces Release, Accelerates Inference to Reduce Costs and Doesn't Sacrifice Performance

February 18th.DeepSeek Today's official announcement of the launch of NSA (Native Sparse Attention), which is a hardware-aligned and natively trainable sparse attention mechanism for ultra-fast long context training and inference.

DeepSeek Releases Another Cost-Reducing Move: NSA Announces Release, Accelerates Inference to Reduce Costs and Doesn't Sacrifice Performance

The core components of the NSA include:

  • Dynamic Hierarchical Sparse Strategy
  • Coarse-grained token compression
  • Fine-grained token selection

DeepSeek officials say the mechanism optimizes modern hardware designs.Accelerating inference while reducing pre-training costs without sacrificing performance.. Performance is comparable to or better than the full-attention model on generic benchmarks, long context tasks, and instruction-based reasoning.

Attached paper link:

https://arxiv.org/abs/2502.11089

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Kimi Slashes Budget, DeepSeek Shocks Dark Side of the Moon, Holds Off on 'Burning' Ads, Sources Say

2025-2-18 19:55:59

Information

Shenzhen responds to AI civil servants on duty: only auxiliary government affairs, can not make decisions alone

2025-2-19 11:13:30

Search