Google introduced Gemini 2.5 Computer Use on October 7, 2025, a new specialized model that gives AI the ability to directly control a web browser. The technology operates in a loop: the model "sees" a screenshot of the screen, analyzes the users task, and generates an action (a click, text input, or scroll). After the action is executed, a new screenshot is taken, and the loop repeats until the task is complete. This allows Gemini to perform complex, multi-step tasks from a single command, such as finding flight information on one site, booking a hotel on another, and compiling all the data into a spreadsheet. As The Verge notes, this capability transforms the AI from an information assistant into a full-fledged "agent-doer." Unlike some competitors, the current version is limited to browser control only, not the entire operating system. The technology is now available to developers in preview through Google AI Studio and Vertex AI.
Google Launches Gemini 2.5 Computer Use for AI Browser Control
