Google has unveiled a new capability for Gemini 2.5 called Computer Use. This AI model can directly interact with a web browser just like a human user: clicking buttons, filling out forms, scrolling through pages, or dragging and dropping items—all are now within its reach.
The goal is to allow AI to complete tasks on websites that don’t provide APIs, by acting directly on the interface. While similar approaches have already been explored by OpenAI and Anthropic, Google aims to make its version smoother, more reliable, and better integrated with its own tools.
How Gemini 2.5 Computer Use Works
Unlike a traditional API where everything is structured, here the AI must handle an environment designed for humans. It receives a screenshot of the page, analyzes what it sees, and then decides what action to perform—clicking, typing text, scrolling, or dragging an element.
After each action, a new screenshot is generated, and the process repeats until the task is complete. The system runs in loops, with the AI maintaining a history of its past actions to preserve context.
In other words, it doesn’t “guess” what to do—it observes, reasons, and acts much like a real person navigating a web interface.
Real-World Use Cases
Google has shared several demos to showcase how it works. Some are available on YouTube, while others can be tested through Browserbase, a platform specializing in AI agent testing. Examples include:
- Filling out and submitting a web form,
- Sorting virtual sticky notes on a collaborative board,
- Browsing Hacker News to spot trending discussions,
- Even playing the puzzle game 2048 by taking control of the interface.
Internally, Google is already using the model to automate interface testing in projects like AI Mode, Project Mariner, or Firebase Testing Agent. In these cases, the AI simulates user behavior step by step to confirm whether a form works or if an interface responds properly.
In the future, this kind of technology could help with booking hotels, completing government paperwork, or navigating SaaS dashboards—without you lifting a finger.
Browser-Only for Now
Currently, the AI is limited to the browser environment. It cannot open local apps or interact with the operating system, such as clicking the Start menu. Google intentionally chose this boundary to avoid unpredictable bugs or misuse.
Is this a real limitation? Not really—most tools and services already run on the web: online platforms, SaaS apps, dashboards, and forms. And this is exactly where Gemini 2.5 Computer Use shines.
At this stage, the model supports about a dozen basic actions (clicking, typing, scrolling, dragging, etc.), which is enough to handle most scenarios tested so far.
Built-In Safeguards
Google has added several protections to prevent misuse. Before performing certain actions, the AI may ask for confirmation. Behind the scenes, an oversight system validates each step to avoid risky behavior, such as placing unintended orders or interacting with sensitive elements.
Developers can also customize restrictions, blocking specific actions or tightening permissions. In short, the model doesn’t act behind your back—which is reassuring.
Where and How to Use It
Gemini 2.5 Computer Use is already available in preview through the Gemini API, accessible on Google AI Studio and Vertex AI. Developers can test the model, build their own agents, and even tailor the available actions.
And if you'd like to go a step further in supporting us, you can treat us to a virtual coffee ☕️. Thank you for your support ❤️!

We do not support or promote any form of piracy, copyright infringement, or illegal use of software, video content, or digital resources.
Any mention of third-party sites, tools, or platforms is purely for informational purposes. It is the responsibility of each reader to comply with the laws in their country, as well as the terms of use of the services mentioned.
We strongly encourage the use of legal, open-source, or official solutions in a responsible manner.
Comments