Google accelerates Gemini Nano on Pixel with frozen multi-token prediction

Google Research has introduced frozen multi-token prediction for Gemini Nano, boosting on-device AI performance on Pixel devices with faster inference, lower memory usage, and improved energy efficiency.

Google Research has unveiled a new inference technique called frozen multi-token prediction (MTP) for Gemini Nano v3 models running on Pixel devices. Instead of retraining the core model, Google adds a lightweight MTP head that predicts multiple tokens in parallel while reusing the model's existing key-value cache.

This zero-copy architecture reduces memory usage by up to 130 MB and delivers more than 50% faster inference on Pixel 9 and Pixel 10 devices without changing model outputs.

Google says the approach improves responsiveness and energy efficiency for on-device AI features such as notification summaries and text proofreading.

Google