DiffusionGemma enables faster text generation with diffusion models

Google’s DiffusionGemma introduces a diffusion-based approach to text generation, producing multiple tokens simultaneously instead of one at a time. This delivers significantly faster output while maintaining strong performance.

Google’s DiffusionGemma is an open text generation model that uses diffusion techniques rather than traditional autoregressive generation. Instead of creating text one token at a time, the model generates and refines entire blocks of text in parallel.

This approach enables substantially faster performance, with reported speeds exceeding 1,000 tokens per second on high-end hardware. DiffusionGemma builds on research that applies diffusion methods, commonly used in image generation, to language tasks.

The model aims to provide developers with lower latency, efficient local deployment, and a new path for building responsive AI applications while maintaining strong text and coding capabilities.

Google