Robot Utility Models: the coolest thing you never heard about (yet)

Listen to this article

Robot Utility Models, or RUM for short, are a new area of research and development for the advancement of AI training for robotics. RUM was created by Lerrel Pinto, an assistant professor of computer science, and a team at at New York University.

This open-source research project is attempting to generalize training for robots when you don’t want to have to train thousands of examples of a task, and then have the operation succeed in zero-shot situations or unseen environments.

The recent explosion of humanoid development projects, along with efforts to bring humanoid and other form-factor robots into the home, is exacerbating the need for model training protocols that won’t require decades of training time.

Lerrel and his team have started small by enabling simple tasks such as opening a door or drawer. In these simple use cases, the researchers attempted to train on diverse, but high-quality data.

For example, this means training 25 examples in 40 different environments versus training 200 examples in five different environments.

The NYU team reported 90% accuracy, with an average of 1.31 tries per success in a zero-shot situation.

How do Robot Utility Models work?

“The Stick” is an inexpensive gripper used for training RUM data. It uses an iPhone and a standard grabber from Amazon. | Credit: Robot Utility Models team

For task training, the team invented “The Stick,” which includes off-the-shelf parts. The smartphone captures all of the data of the scene as a task, such as opening a door, is completed.

The Stick uses a foldable suction reacher/grabber tool from Amazon and integrates an iPhone 12 Pro or later model with a 3D-printed phone holder. The entire gripper unit can be built for under $30.

The software uses the iPhone Pro’s camera and lidar sensor to capture multimodal data for the Robot Utility Models.

The Stick gripper has the same end effector used by Hello Robot’s Stretch robot, which is one of the robots that Lerrel’s NYU lab employs to perform task tests and validate the model’s accuracy.

The imitation learning concept simplifies data collection and differs from some of the early work in diffusion model training because of the data-diversity goal.

Projects such as Stanford Mobile ALOHA demonstrated that a model could be trained with sufficient accuracy after just a few training iterations, However, Mobile ALOHA training doesn’t appear to be as generalizable as a RUM, although the ALOHA method learns faster.

NYU team touts advantages of a RUM

Based on the initial research, Robot Utility Models are better than Mobile ALOHA at spatial generalization, object generalization, and scene generalization. According to Prof. Pinto, the RUM approach requires more training data across diverse environments.

Earlier this year, The Robot Report spoke with Stanford University Ph.D. student Cheng Chi about his research and recent publications about using AI models for robotics applications.

In addition, using this strategy, basic tasks can be linked into action chains. So the robot might be able to open a drawer, pick up a spoon, and then stir a liquid in a glass on the table, having been taught each of the tasks individually.

The team is also using ChatGPT to evaluate the scene after the robot has attempted a task and determine if the robot accomplished the task. The robot uses its onboard camera to acquire an image of the scene, which it sends to ChatGPT. This closes the loop on the overall task, said the researchers.

Researchers rely on Hello Robot Stretch

Stretch 3 is portable, lightweight, and designed from the ground up to work around people. | Credit: Hello Robot

The Robot Utility Models project is just one example of how Hello Robot is advancing AI research and development to accelerate robotics adoption in novel and complex use cases. Hello Robot is one of the few mobile manipulation companies ready to begin testing in the home.

Unlike the more complex humanoid robots that have many more degrees of freedom, the Stretch 3 robot is an unintimidating and stable robotic platform to deploy into the home, said the company.

Hello Robot said it has already sold a number of Stretch 3 robots to end users with disabilities who are looking for an autonomous way to regain agency along with support for daily tasks and household chores.

In addition to its commercial design for Stretch, Hello Robot said it has encouraged the R&D community and supported open-source development on the platform.

Charlie Kemp, a co-founder and chief technology officer of Hello Robot, also founded the Healthcare Robotics Lab at Georgia Tech, where he was an associate professor. He said that he understands how the research community functions and that he has a deep network of peers and colleagues in institutions around the world.

SITE AD for the 2025 Robotics Summit registration. Register now