Adam Zewe, Author at The Robot Report

MIT develops multimodal technique to train robots

Adam Zewe — Tue, 29 Oct 2024 15:35:57 +0000

Researchers filmed multiple instances of a robot arm feeding a dog. The videos were included in datasets to train the robot. | Credit: MIT

Training a general-purpose robot remains a major challenge. Typically, engineers collect data that are specific to a certain robot and task, which they use to train the robot in a controlled environment. However, gathering these data is costly and time-consuming, and the robot will likely struggle to adapt to environments or tasks it hasn’t seen before.

To train better general-purpose robots, MIT researchers developed a versatile technique that combines a huge amount of heterogeneous data from many of sources into one system that can teach any robot a wide range of tasks.

Their method involves aligning data from varied domains, like simulations and real robots, and multiple modalities, including vision sensors and robotic arm position encoders, into a shared “language” that a generative AI model can process.

By combining such an enormous amount of data, this approach can be used to train a robot to perform a variety of tasks without the need to start training it from scratch each time.

This method could be faster and less expensive than traditional techniques because it requires far fewer task-specific data. In addition, it outperformed training from scratch by more than 20% in simulation and real-world experiments.

“In robotics, people often claim that we don’t have enough training data. But in my view, another big problem is that the data come from so many different domains, modalities, and robot hardware. Our work shows how you’d be able to train a robot with all of them put together,” said Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

Wang’s co-authors include fellow EECS graduate student Jialiang Zhao; Xinlei Chen, a research scientist at Meta; and senior author Kaiming He, an associate professor in EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

This figure shows how the new technique aligns data from varied domains, like simulation and real robots, and multiple modalities, including vision sensors and robotic arm position encoders, into a shared “language” that a generative AI model can process. | Credit: MIT

Inspired by LLMs

A robotic “policy” takes in sensor observations, like camera images or proprioceptive measurements that track the speed and position a robotic arm, and then tells a robot how and where to move.

Policies are typically trained using imitation learning, meaning a human demonstrates actions or teleoperates a robot to generate data, which are fed into an AI model that learns the policy. Because this method uses a small amount of task-specific data, robots often fail when their environment or task changes.

To develop a better approach, Wang and his collaborators drew inspiration from large language models like GPT-4.

These models are pretrained using an enormous amount of diverse language data and then fine-tuned by feeding them a small amount of task-specific data. Pretraining on so much data helps the models adapt to perform well on a variety of tasks.

“In the language domain, the data are all just sentences. In robotics, given all the heterogeneity in the data, if you want to pretrain in a similar manner, we need a different architecture,” he said.

Robotic data take many forms, from camera images to language instructions to depth maps. At the same time, each robot is mechanically unique, with a different number and orientation of arms, grippers, and sensors. Plus, the environments where data are collected vary widely.

Register now

The MIT researchers developed a new architecture called Heterogeneous Pretrained Transformers (HPT) that unifies data from these varied modalities and domains.

They put a machine-learning model known as a transformer into the middle of their architecture, which processes vision and proprioception inputs. A transformer is the same type of model that forms the backbone of large language models.

The researchers align data from vision and proprioception into the same type of input, called a token, which the transformer can process. Each input is represented with the same fixed number of tokens.

Then the transformer maps all inputs into one shared space, growing into a huge, pretrained model as it processes and learns from more data. The larger the transformer becomes, the better it will perform.

A user only needs to feed HPT a small amount of data on their robot’s design, setup, and the task they want it to perform. Then HPT transfers the knowledge the transformer grained during pretraining to learn the new task.

Enabling dexterous motions

One of the biggest challenges of developing HPT was building the massive dataset to pretrain the transformer, which included 52 datasets with more than 200,000 robot trajectories in four categories, including human demo videos and simulation.

The researchers also needed to develop an efficient way to turn raw proprioception signals from an array of sensors into data the transformer could handle.

“Proprioception is key to enable a lot of dexterous motions. Because the number of tokens is in our architecture always the same, we place the same importance on proprioception and vision,” Wang explained.

When they tested HPT, it improved robot performance by more than 20% on simulation and real-world tasks, compared with training from scratch each time. Even when the task was very different from the pretraining data, HPT still improved performance.

“This paper provides a novel approach to training a single policy across multiple robot embodiments. This enables training across diverse datasets, enabling robot learning methods to significantly scale up the size of datasets that they can train on. It also allows the model to quickly adapt to new robot embodiments, which is important as new robot designs are continuously being produced,” said David Held, associate professor at the Carnegie Mellon University Robotics Institute, who was not involved with this work.

In the future, the researchers want to study how data diversity could boost the performance of HPT. They also want to enhance HPT so it can process unlabeled data like GPT-4 and other large language models.

“Our dream is to have a universal robot brain that you could download and use for your robot without any training at all. While we are just in the early stages, we are going to keep pushing hard and hope scaling leads to a breakthrough in robotic policies, like it did with large language models,” he said.

Editor’s Note: This article was republished from MIT News.

The post MIT develops multimodal technique to train robots appeared first on The Robot Report.

Can blockchain secure communications for robot fleets?

Adam Zewe — Tue, 05 Oct 2021 19:41:02 +0000

A team of robots searching for and then retrieve lost objects. Blockchain could enable secure, tamper-proof communication among the robots as they complete their task, according to new research from MIT. Credit: MIT/Polytechnic Institute

Imagine a team of autonomous drones equipped with advanced sensing equipment, searching for smoke as they fly high above the Sierra Nevada mountains. Once they spot a wildfire, these leader robots relay directions to a swarm of firefighting drones that speed to the site of the blaze.

But what would happen if one or more leader robots was hacked by a malicious agent and began sending incorrect directions? As follower robots are led farther from the fire, how would they know they had been duped?

The use of blockchain technology as a communication tool for a team of robots could provide security and safeguard against deception, according to a study by researchers at MIT and Polytechnic University of Madrid. The research may also have applications in cities where multi-robot systems of self-driving cars are delivering goods and moving people across town.

A blockchain offers a tamper-proof record of all transactions — in this case, the messages issued by robot team leaders — so follower robots can eventually identify inconsistencies in the information trail.

Leaders use tokens to signal movements and add transactions to the chain, and forfeit their tokens when they are caught in a lie, so this transaction-based communications system limits the number of lies a hacked robot could spread, according to Eduardo Castelló, a Marie Curie Fellow in the MIT Media Lab and lead author of the study.

“The world of blockchain beyond the discourse about cryptocurrency has many things under the hood that can create new ways of understanding security protocols,” Castelló says.

Blockchain not just for Bitcoin

While a blockchain is typically used as a secure ledger for cryptocurrencies, in its essence it is a list of data structures, known as blocks, that are connected in a chain. Each block contains information it is meant to store, the “hash” of the information in the block, and the “hash” of the previous block in the chain. Hashing is the process of converting a string of text into a series of unique numbers and letters.

In this simulation-based study, the information stored in each block is a set of directions from a leader robot to followers. If a malicious robot attempts to alter the content of a block, it will change the block hash, so the altered block will no longer be connected to the chain. The altered directions could be easily ignored by follower robots.

The blockchain also provides a permanent record of all transactions. Since all followers can eventually see all the directions issued by leader robots, they can see if they have been misled.

For instance, if five leaders send messages telling followers to move north, and one leader sends a message telling followers to move west, the followers could ignore that inconsistent direction. Even if a follower robot did move west by mistake, the misled robot would eventually realize the error when it compares its moves to the transactions stored in the blockchain.

Transaction-based communication

In the system the researchers designed, each leader receives a fixed number of tokens that are used to add transactions to the chain — one token is needed to add a transaction. If followers determine the information in a block is false, by checking what the majority of leader robots signaled at that particular step, the leader loses the token. Once a robot is out of tokens it can no longer send messages.

“We envisioned a system in which lying costs money. When the malicious robots run out of tokens, they can no longer spread lies. So, you can limit or constrain the lies that the system can expose the robots to,” Castelló says.

The researchers tested their system by simulating several follow-the-leader situations where the number of malicious robots was known or unknown. Using a blockchain, leaders sent directions to follower robots that moved across a Cartesian plane, while malicious leaders broadcast incorrect directions or attempted to block the path of follower robots.

The researchers found that, even when follower robots were initially misled by malicious leaders, the transaction-based system enabled all followers to eventually reach their destination. And because each leader has an equal, finite number of tokens, the researchers developed algorithms to determine the maximum number of lies a malicious robot can tell.

“Since we know how lies can impact the system, and the maximum harm that a malicious robot can cause in the system, we can calculate the maximum bound of how misled the swarm could be. So, we could say, if you have robots with a certain amount of battery life, it doesn’t really matter who hacks the system, the robots will have enough battery to reach their goal,” Castelló says.

In addition to allowing a system designer to estimate the battery life the robots need to complete their task, the algorithms also enable the user to determine the amount of memory required to store the blockchain, the number of robots that will be needed, and the length of the path they can travel, even if a certain percentage of leader robots are hacked and become malicious.

“You can design your system with these tradeoffs in mind and make more informed decisions about what you want to do with the system you are going to deploy,” he says.

In the future, Castelló hopes to build off this work to create new security systems for robots using transaction-based interactions. He sees it as a way to build trust between humans and groups of robots.

“When you turn these robot systems into public robot infrastructure, you expose them to malicious actors and failures. These techniques are useful to be able to validate, audit, and understand that the system is not going to go rogue. Even if certain members of the system are hacked, it is not going to make the infrastructure collapse,” he says.

The paper was co-authored by Ernesto Jiménez and José Luis López-Presa of the Universidad Politécnica de Madrid. This research was funded by the European Union’s Horizon 2020 Research and Innovation Program, the Regional Government of Madrid, and the MIT International Science and Technology Initiatives Global Seed Fund.

Editor’s Note: This article was republished from MIT News.

The post Can blockchain secure communications for robot fleets? appeared first on The Robot Report.