Google DeepMind’s new AI models can search the web to help robots complete tasks

Google DeepMind's new AI models can search the web to help robots complete tasks

Google DeepMind says its upgraded AI models enable robots to complete more complex tasks — and even tap into the web for help. During a press briefing, Google DeepMind’s head of robotics, Carolina Parada, told reporters that the company’s new AI models work in tandem to allow robots to “think multiple steps ahead” before taking action in the physical world.
The system is powered by the newly launched Gemini Robotics 1.5 alongside the embodied reasoning model, Gemini Robotics-ER 1.5, which are updates to AI models that Google DeepMind introduced in March. Now robots can perform more than just singular tasks, such as folding a piece of paper or unzipping a bag. They can now do things like separate laundry by dark and light colors, pack a suitcase based on the current weather in London, as well as help someone sort trash, compost, and recyclables based on a web search tailored to a location’s specific requirements.
“The models up to now were able to do really well at doing one instruction at a time in a way that is very general,” Parada said. “With this update, we’re now moving from one instruction to actually genuine understanding and problem-solving for physical tasks.”
To do this, robots can use the upgraded Gemini Robotics-ER 1.5 model to form an understanding of their surroundings, and use digital tools like Google Search to find more information. Gemini Robotics-ER 1.5 then translates those findings into natural language instructions for Gemini Robotics 1.5, allowing the robot to use the model’s vision and language understanding to carry out each step.

Additionally, Google DeepMind announced that Gemini Robotics 1.5 can help robots “learn” from each other, even if they have different configurations. Google DeepMind found that tasks presented to the ALOHA2 robot, which consists of two mechanical arms, “just work” on the bi-arm Franka robot, as well as Apptronik’s humanoid robot Apollo. “This enables two things for us: one is to control very different robots — including a humanoid — with a single model,” Google DeepMind software engineer Kanishka Rao said during the briefing. “And secondly, skills that are learned on one robot can now be transferred to another robot.”
As part of the update, Google DeepMind is rolling out Gemini Robotics-ER 1.5 to developers through the Gemini API in Google AI Studio, while only select partners can access Gemini Robotics 1.5.

