Google says Gemini AI makes its robots smarter


Google is training its robots with Gemini AI so they can better navigate and complete tasks, the DeepMind robotics team explained in a statement new research paper how using Gemini 1.5 Pro’s long context window – which dictates how much information an AI model can process – makes it easier for users to interact with its RT-2 robots using natural language instructions.

The system works by filming a video tour of a designated area, such as a home or office, with researchers using Gemini 1.5 Pro to have the robot “watch” the video to learn more about the environment. The robot can then execute commands based on what it’s observed using verbal and/or visual outputs, such as guiding users to a power outlet after being shown a phone and asked “where can I charge it?” DeepMind says its Gemini-powered robot had a 90% success rate on more than 50 user instructions that were given in an operating area of ​​more than 9,000 square feet.

The researchers also found “preliminary evidence” that Gemini 1.5 Pro allowed its droids to plan how to execute instructions beyond simple navigation. For example, when a user with a lot of Coke cans on their desk asks the droid if their favorite drink is available, the team said Gemini “knows that the robot should go to the refrigerator, check for Cokes, and then return to the user to communicate the result.” DeepMind said it plans to study these findings in more detail.

The video demonstrations provided by Google are impressive, though the obvious cuts after the droid acknowledges each query hide the fact that it takes between 10 and 30 seconds to process these instructions, according to the research report. It may be a while before we share our homes with more advanced environment-mapping robots, but at least they might be able to find our misplaced keys or wallets.

Leave a Comment