February 18th.DeepSeek Today's official announcement of the launch of NSA (Native Sparse Attention), which is a hardware-aligned and natively trainable sparse attention mechanism for ultra-fast long context training and inference.

The core components of the NSA include:
- Dynamic Hierarchical Sparse Strategy
- Coarse-grained token compression
- Fine-grained token selection
DeepSeek officials say the mechanism optimizes modern hardware designs.Accelerating inference while reducing pre-training costs without sacrificing performance.. Performance is comparable to or better than the full-attention model on generic benchmarks, long context tasks, and instruction-based reasoning.
Attached paper link:
https://arxiv.org/abs/2502.11089