Leung Wensai's new paper came to light: DeepSeek V4 or introduced a new memory structure

The news of January 13, this morningDeepSeek Open source complete new architecture module "Engram" and synchronized the release of technical papers, re-emerged in the authorLeung Man Fung.

Leung Wensai's new paper came to light: DeepSeek V4 or introduced a new memory structure

It has been learned that the Engram module, by introducing a scalable searchable memory structure, provides a completely new slender dimension for the larger model, different from the traditional Transformer and MoE。

In a paper, DeepSeek noted that the current mainstream large model was structurally inefficient in dealing with two types of tasks: One is the "table" memory that relies on fixed knowledge, and the other is complex reasoning and combination calculations。

The traditional Transformer (whether Dense or MoE) has to re-establish these static patterns through multi-layered attention and MLP, leading to a significant consumption of computing resources on "repeated construction of known models"。

Engram's core mechanism is based on modern Hashi N-gram embedded O(1) search memory. The module will perform N-gram slices for input token sequences and achieves constant-time retrieval through multi-Hashi mapping to an extended static memory table。

It was stressed that such searches were not related to the size of the model and that the search costs remained stable even if the memory tables were extended to a billion-scale parameter。

In contrast to MoE's calculations, Engram offers "conditional memory". The module will determine whether the search results will be enabled according to the current context and will be integrated with the backbone network through a door-control mechanism。

The paper showed that Engram was usually placed in the early stages of the model to take on the role of "model reconstruction", thus releasing the depth of the calculation of the subsequent layer for complex reasoning。

DeepSeek, in an experiment of 27B parameter sizes, redistributed part of the MoE expert parameter to the Engram memory table, and the model was significantly upgraded in terms of knowledge, reasoning, code and mathematical tasks under the same parameters and equal computational conditions。

On the X platform, the technical discussions concluded that the Engram mechanism had been effective in reducing the need for re-establishment of static models at the early stages of the model, making the model more "deep" in the reasoning part。

SOME DEVELOPERS POINT OUT THAT THIS STRUCTURE ALLOWS LARGE-SCALE STATIC MEMORY TO BE REMOVED FROM THE GPU STORAGE LIMIT AND TO PREFEASIBILITY OF HOST MEMORY THROUGH A DEFINITIVE LOCATION, THUS KEEPING COSTS LOW AT THE REASONING STAGE。

Many observers speculate that Engram is likely to be the core technology base for the next generation of DeepSeek models, V4。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Apple works with Google, Gemini will support Apple Intelligence

2026-1-13 12:49:34

Information

Grok Deep-false images trigger a global rebound involving multinational regulators

2026-1-13 12:51:48

Search