DeepSeek, in collaboration with academic researchers, has unveiled Engram, a novel conditional memory system designed for large language models.
Engram decouples memory storage from computation, allowing models to store and retrieve knowledge efficiently without overloading GPU memory. This approach significantly reduces reliance on expensive high-bandwidth memory while improving reasoning depth and inference efficiency.
By caching knowledge instead of full context, Engram addresses one of the biggest bottlenecks in scaling AI models. The innovation is expected to influence the architecture of next-generation models, including DeepSeek V4, and could reshape how large models balance performance, cost, and scalability.

.png)

