In short
- The system used Google’s Gemini model to reason about goals, explain their plans, and act in unfamiliar games.
- SIMA 2 learned new skills through self-directed gameplay and adapted to the worlds created moments before by Genie 3.
- DeepMind has planned a limited search view for developers and academics.
Google DeepMind introduced SIMA 2 on Thursday – a new one I have an agent which the company says behaves like a “companion” in virtual worlds. With the launch of SIMA 2, DeepMind aims to move beyond simple on-screen actions and move toward AI that can plan, explain itself, and learn through experience.
“This is a significant step in the direction of General Artificial Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general,” the company said on its website.
The first version of SIMA (Scalable Instructable Multiworld Agent), released in March 2024, taught hundreds of basic skills by looking at the screen and using virtual keyboard and mouse controls. The new version of SIMA, Google said, takes things a step further than AI thinking for itself.
“SIMA 2 is our most capable AI agent for 3D virtual worlds,” Google DeepMind he wrote on X. “Powered by Gemini, it goes beyond following basic instructions to think, understand and take action in interactive environments – which means you can talk to it through text, voice, or even images.”
Using the Gemini The AI model, Google said SIMA can interpret high-level goals, talk through the steps it intends to take, and collaborate in games with a level of reasoning that the original system could not reach.
DeepMind reported stronger generalization in the virtual environment, and that SIMA 2 completed longer and more complex tasks, which included logical suggestions, sketches drawn on the screen, and emojis.
“As a result of this capability, the performance of SIMA 2 is significantly closer to that of a human actor in a wide range of tasks,” wrote Google, noting that SIMA 2 had a completion rate of 65%, compared to 31% from SIMA 1.
The system also interpreted the instructions and acted in completely new 3D worlds generated by Genius 3Another DeepMind project published last year that creates interactive environments from a single image or text prompt. SIMA 2 orientated, understood the goals, and took significant action in worlds that had never encountered until moments before trying.
“SIMA 2 is now much better at executing detailed instructions, even in never-before-seen worlds,” Google wrote. “He can transfer concepts learned like ‘mining’ in one game and apply to ‘harvesting’ in another – connect the dots between similar functions.”
After learning from human demonstrations, the researchers said the agent switched to self-directed play, using trial and error and feedback generated by Gemini to create a new experience. dataincluding a training cycle where SIMA 2 generated tasks, tested them, and then fed their trajectory data into the next version of the model.
While Google hailed SIMA 2 as a step forward for artificial intelligence, the research also identified gaps that still need to be addressed, including struggling with very long and multi-step tasks, working within a limited memory window, and facing visual interpretation challenges common to 3D AI systems.
Even so, DeepMind said the platform served as a testbed for skills that could eventually migrate into robotics and navigation.
“Our SIMA 2 research offers a strong path toward applications in robotics and another step toward real-world AGI,” he said.
GG Newsletter
Get the latest web3 gaming news, hear directly from gaming studios and influencers covering the space, and receive power-ups from our partners.