Google adds Computer Use to the Gemini API

Google has introduced Computer Use in the Gemini API, enabling developers to build AI agents that can interact with browser, mobile, and desktop interfaces through clicks, typing, and other UI actions.

Google has launched Computer Use in public preview for the Gemini API, allowing developers to create AI agents that interact directly with graphical user interfaces. The feature enables Gemini 3.5 Flash to understand screenshots and perform actions such as clicking, typing, scrolling, and navigating browser, mobile, and desktop environments.

It also introduces configurable safety policies, prompt injection detection, and action intents that explain the model’s reasoning. Developers implement the execution loop while Gemini generates the next UI action based on the current screen state.

Google says the capability is designed for browser automation, UI testing, research, and other agentic workflows.

‍

Google