[ad_1]
Imagine buying a robot to perform household tasks. This robot was built and trained in a factory to perform certain tasks and has never seen things in your home. When you ask him to pick up a mug from your kitchen table, he may not recognize your mug (perhaps because the mug is painted with, say, an unusual image of MIT’s mascot, Tim Bever). So the robot fails.
“Now, as we train these robots, when they fail, we don’t really know why. So you’d just throw up your hands and say, “Well, I guess we’ll have to start over.” A critical component missing from this system is allowing the robot to show why it’s not working so the user can provide feedback,” says Andy Peng, a graduate student in electrical engineering and computer science (EECS) at MIT.
Peng and his colleagues at MIT, New York University and the University of California, Berkeley have created a framework that allows people to quickly teach a robot what they want with minimal effort.
When the robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needs to change to make the robot succeed. For example, maybe a robot could pick up a cup if the cup was a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. The system then uses this feedback and counterfactual explanation to generate new data that it uses to refine the robot.
Fine-tuning involves processing a machine learning model that has already been trained to perform one task so that it can perform another, similar task.
The researchers tested this technique in simulations and found that it can teach the robot more effectively than other methods. Robots trained with this framework performed better, and the training process took less human time.
This framework can help robots learn faster in new environments without requiring technical knowledge from the user. In the long term, this could be a step for general-purpose robots to efficiently perform everyday tasks for the elderly or disabled in a variety of environments.
Peng, lead author, is joined by co-authors Aviv Netanyahu, EECS graduate student; Mark Ho, assistant professor at Stevens Institute of Technology; Tianmin Shu, MIT postdoc; Andrea Bobu, graduate student at UC Berkeley; and senior authors Julie Shah, MIT professor of aeronautics and astronautics and director of the Interactive Robotics Group at the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, professor at CSAIL. The research will be presented at the International Conference on Machine Learning.
On service training
Robots often fail because of a change in distribution – the robot is given objects and spaces that it did not see during training, and it does not understand what to do in this new environment.
One way to train a robot for a specific task is through imitation learning. The user can show the correct task to teach the robot what to do. If the user tries to teach the robot to pick up a cup, but demonstrates with a white cup, the robot can learn that all cups are white. Then he might not be able to pick up a red, blue, or Tim-the-Beaver-brown mug.
Teaching a robot to recognize that a cup is a cup regardless of its color can take thousands of demonstrations.
“I don’t want to have to demonstrate with 30,000 glasses. I want to show only one cup. But then I have to teach the robot to recognize that it can pick up any color cup,” says Peng.
To do this, the researchers’ system determines what specific object the user is interested in (the cup) and which elements are not relevant to the task (perhaps the color of the cup does not matter). It uses this information to create new, synthetic data by modifying these “non-meaningful” visual concepts. This process is known as data augmentation.
The framework has three steps. First, it shows the task that caused the robot to fail. It then collects demonstrations of the desired actions from the user and generates counterfactuals by finding all the features in the space that show what needs to change for the robot to succeed.
The system displays these counterfactuals to the user and asks for feedback to determine which visual concepts do not influence the desired action. It then uses this human feedback to create many new enhanced demonstrations.
In this way, the user could demonstrate picking up a single cup, but the system would generate demonstrations showing the desired action with thousands of different cups changing color. It uses this data to fine-tune the robot.
Creating counterfactual explanations and soliciting feedback from users is critical to the technique’s success, Peng says.
From human reasoning to robot reasoning
Because their work is aimed at getting a person into an exercise cycle, the researchers tested their technique on humans. They first conducted a study in which they asked people whether counterfactual explanations helped them identify elements that could be changed without affecting the task at hand.
“It was so obvious right away. Humans are very good at this type of counterfactual reasoning. And that reverse step is what allows human reasoning to be translated into robot reasoning in a way that makes sense,” he says.
They then applied their framework to three simulations where the robots were tasked with: navigating to a target object, picking up a key and opening a door, and picking up the desired object and then placing it on a table. In each case, their method allowed the robot to learn faster than other techniques while requiring less demonstration from users.
Moving forward, the researchers hope to test this framework on real robots. They also want to focus on reducing the time it takes for the system to generate new data using generative machine learning models.
“We want robots to do what humans do, and we want them to do it in a semantically meaningful way. People operate in this abstract space where they don’t think about every feature of the image. At the end of the day, it’s really about letting the robot learn good, human-like performance at an abstract level,” says Peng.
This research is supported in part by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation’s Artificial Intelligence and Fundamentals Institute. interactions.
[ad_2]
Source link