Google has launched Computer Use in public preview for the Gemini API, allowing developers to create AI agents that interact directly with graphical user interfaces. The feature enables Gemini 3.5 Flash to understand screenshots and perform actions such as clicking, typing, scrolling, and navigating browser, mobile, and desktop environments.
It also introduces configurable safety policies, prompt injection detection, and action intents that explain the model’s reasoning. Developers implement the execution loop while Gemini generates the next UI action based on the current screen state.
Google says the capability is designed for browser automation, UI testing, research, and other agentic workflows.


.jpg)


