These all-purpuse robots signal a new era

SWETA AKUNDI

May 2025
from Shaastra :: vol 04 issue 04 :: May 2025

Agility Robotics is known for its humanoid robot, Digit, deployed in warehouses and factories for tasks like loading, stacking and recycling.

Researchers are on the path towards general-purpose bots that can complete tasks in unfamiliar environments with minimal training.

Amini humanoid, nicknamed 'Blue', shuffled up to NVIDIA CEO Jensen Huang while he was giving his keynote address at the GPU Technology Conference in March 2025. In collaboration with DeepMind and Disney Research, NVIDIA released Project GR00T, a next-generation foundation model to train humanoids. Short for Generalist Robot 00 Technology, the model can understand natural language and emulate movements by observing human actions, taking artificial intelligence (AI) outside screens and into the physical realm.

On stage, Blue is reminiscent of Disney's WALL-E, with its glittery yellow 'eyes' and a dog-like eager-to-please demeanour. But beneath the charming exterior lay a more important message: a nod to a future where more robots could work human jobs, fuelled by graphics processing units (GPUs) and physics engines inside. Although robotic foundational models have not yet been commercialised on a large scale, the first generation of such models is already being shipped for testing to companies like 1X Technologies, Agility Robotics, Apptronik and Boston Dynamics.

Robotic foundational models herald a new era of generalist robots that can train on vast sets of human action data. These models aim to replicate the success that large language models (LLMs) have seen. The idea is that if you collect enough data, you can train models to reason on not only visual and language input but also about the robots' own physical actions. In essence, they try to understand how to interact with their environment. Robotic foundational models are a step towards general intelligence, but grounded in physical interaction.

Such models are a partnership between robotics manufacturers who build the hardware, and researchers building the AI platforms for them. AI companies are launching spinoffs and separate verticals; they start with their vision and language models and then fine-tune them with robot-specific data.

Covariant, a robotics intelligence company that spun off from OpenAI in 2017, is now building RFM-1, a robotic foundational model to help warehouse robots manipulate diverse objects with greater autonomy. More recently, in April, Google DeepMind announced Gemini Robotics, a vision-language-action model, built on Gemini 2.0, that allows robots to reason about a scene before taking an action. Its big moment arrived when a bi-arm robot operating on Gemini could slam-dunk a toy basketball through a hoop without having ever seen anything related to basketball, or that specific toy.