Models
June 27, 2026

Google accelerates Gemini Nano on Pixel with frozen multi-token prediction

Google Research has introduced frozen multi-token prediction for Gemini Nano, boosting on-device AI performance on Pixel devices with faster inference, lower memory usage, and improved energy efficiency.

Google Research has unveiled a new inference technique called frozen multi-token prediction (MTP) for Gemini Nano v3 models running on Pixel devices. Instead of retraining the core model, Google adds a lightweight MTP head that predicts multiple tokens in parallel while reusing the model's existing key-value cache.

This zero-copy architecture reduces memory usage by up to 130 MB and delivers more than 50% faster inference on Pixel 9 and Pixel 10 devices without changing model outputs.

Google says the approach improves responsiveness and energy efficiency for on-device AI features such as notification summaries and text proofreading.

#
Google

Read Our Content

See All Blogs
Gen AI

Sakana AI Fugu enables one API for smarter routing and better production AI architecture

Sarankumar S

June 23, 2026
Read more
Gen AI

Plumbata saves 95% review time using AI contract management software

Deveshi Dabbawala

June 23, 2026
Read more