AI News

News · · 9:34 PM · obrisen

Google’s Gemini 2.5 Navigates Web Autonomously

Google has unveiled a new version of its Gemini large language model that can autonomously navigate the web through a browser and interact with various websites. This model can perform tasks such as searching for information or making purchases without human supervision.

The Gemini 2.5 Computer Use model combines visual understanding and reasoning to analyze user requests and execute tasks within the browser. It can perform actions such as clicking, typing, scrolling, manipulating dropdown menus, and filling out forms.

Based on the Gemini 2.5 Pro LLM, this is the first time the complete model has been made available. Each request initiates a 'loop' where the model goes through various steps until completion, involving user inputs and screenshots.

Google has released demonstration videos of the tool in action, showcasing tasks like retrieving pet details and scheduling appointments. The model is currently limited to web browser access, unlike more comprehensive tools from OpenAI and Anthropic.

Despite its limitations, DeepMind researchers claim that Gemini 2.5 Computer Use excels in web browser performance, outperforming competitors on multiple benchmarks. It is available to developers through Google AI Studio and Vertex AI, with pricing similar to the Gemini 2.5 Pro model but without a free tier.

The introduction of this model highlights Google's ongoing efforts to enhance AI capabilities in web interactions, although it faces competition from other AI developers.