Google Research has unveiled a new inference technique called frozen multi-token prediction (MTP) for Gemini Nano v3 models running on Pixel devices. Instead of retraining the core model, Google adds a lightweight MTP head that predicts multiple tokens in parallel while reusing the model's existing key-value cache.
This zero-copy architecture reduces memory usage by up to 130 MB and delivers more than 50% faster inference on Pixel 9 and Pixel 10 devices without changing model outputs.
Google says the approach improves responsiveness and energy efficiency for on-device AI features such as notification summaries and text proofreading.


.jpg)


