The new paper signed by Liang Wensai came to light: DeepSeek V4 or introduce a new memory structure

The news of January 13, this morningDeepSeek Open source complete new architecture module "Engram" and synchronized the release of technical papers, re-emerged in the authorLeung Man Fung.

Leung Wensai's new paper came to light: DeepSeek V4 or introduced a new memory structure

It has been learned that the Engram module, by introducing a scalable searchable memory structure, provides a completely new slender dimension for the larger model, different from the traditional Transformer and MoE。

In a paper, DeepSeek noted that the current mainstream large model was structurally inefficient in dealing with two types of tasks: One is the "table" memory that relies on fixed knowledge, and the other is complex reasoning and combination calculations。

The traditional Transformer (whether Dense or MoE) has to re-establish these static patterns through multi-layered attention and MLP, leading to a significant consumption of computing resources on "repeated construction of known models"。

Engram's core mechanism is based on modern Hashi N-gram embedded O(1) search memory. The module will perform N-gram slices for input token sequences and achieves constant-time retrieval through multi-Hashi mapping to an extended static memory table。

It was stressed that such searches were not related to the size of the model and that the search costs remained stable even if the memory tables were extended to a billion-scale parameter。

In contrast to MoE's calculations, Engram offers "conditional memory". The module will determine whether the search results will be enabled according to the current context and will be integrated with the backbone network through a door-control mechanism。

The paper showed that Engram was usually placed in the early stages of the model to take on the role of "model reconstruction", thus releasing the depth of the calculation of the subsequent layer for complex reasoning。

DeepSeek, in an experiment of 27B parameter sizes, redistributed part of the MoE expert parameter to the Engram memory table, and the model was significantly upgraded in terms of knowledge, reasoning, code and mathematical tasks under the same parameters and equal computational conditions。

On the X platform, the technical discussions concluded that the Engram mechanism had been effective in reducing the need for re-establishment of static models at the early stages of the model, making the model more "deep" in the reasoning part。

SOME DEVELOPERS POINT OUT THAT THIS STRUCTURE ALLOWS LARGE-SCALE STATIC MEMORY TO BE REMOVED FROM THE GPU STORAGE LIMIT AND TO PREFEASIBILITY OF HOST MEMORY THROUGH A DEFINITIVE LOCATION, THUS KEEPING COSTS LOW AT THE REASONING STAGE。

Many observers speculate that Engram is likely to be the core technology base for the next generation of DeepSeek models, V4。

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

{{userData.name}}Verify

Leung Wensai's new paper came to light: DeepSeek V4 or introduced a new memory structure

Apple works with Google, Gemini will support Apple Intelligence

Grok Deep-false images trigger a global rebound involving multinational regulators

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Apple works with Google, Gemini will support Apple Intelligence

Grok Deep-false images trigger a global rebound involving multinational regulators

DeepSeek has only 160 employees: New Hope Chairman Liu Yonghao reveals his conversation with Liang Wenfeng, praises young people for being more aware of new technologies

DeepSeek R2 to be delayed

DeepSeek-R1 Thesis is on the cover of Nature, which is written by Liang Wenbing

DeepSeek Launching New Paper: Presenting New MHC Structures, List of Authors

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow