[ad_1]
Your brand new household robot is delivered to your home and you ask it to make you a cup of coffee. Although he knows some basic skills from previous practice in mock kitchens, there are many actions that can be performed – turning on the faucet, flushing the toilet, emptying the flour container, etc. But there are a small number of actions that may be useful. How can a robot figure out what steps are reasonable in a new situation?
It can use PIGINet, a new system that aims to effectively enhance the problem-solving capabilities of household robots. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are using machine learning to reduce the typical iterative process of task planning, which considers all possible actions. PIGINet eliminates task plans that fail to meet the collision-free requirements and reduces planning time by 50-80 percent when training on only 300-500 problems.
Typically, robots try different task plans and iteratively improve their moves until they find a suitable solution, which can be inefficient and time-consuming, especially when there are moving and articulated obstacles. Maybe after cooking, for example, you want to put all the sauces in the cabinet. This problem may take two to eight steps depending on what the world looks like at the time. Does the robot need to open multiple cabinet doors, or is there an obstacle inside the cabinet that needs to be moved to make room? You don’t want your robot to be annoyingly slow—and it would be worse if it burned its dinner while it was thinking.
As a rule, household robots are considered as pre-defined recipes for performing tasks, which are not always suitable for diverse or changing environments. So how does PIGINet avoid these pre-defined rules? PIGINet is a neural network that takes “plans, images, goals, and initial facts” and then predicts the probability that the task plan can be refined to find feasible movement plans. In simple words, it uses a transformer encoder, a versatile and modern model designed to work on data sequences. The input sequence, in this case, is information about which task plan is being considered, representations of the environment, and symbolic encoding of the initial state and desired goal. The encoder combines the task plans, image, and text to generate a prediction about the appropriateness of the selected task plan.
By storing items in the kitchen, the team created hundreds of simulated environments, each with a different layout and specific tasks that required moving items between counters, refrigerators, cabinets, sinks and kitchen pots. By measuring the time spent solving the problems, they compared PIGINet with previous approaches. One correct task plan might include opening the left door of the refrigerator, removing the lid from the pot, moving the cabbage from the pot to the refrigerator, moving the potatoes to the refrigerator, taking the bottle from the sink, putting the bottle in the sink, picking. Tomatoes, or putting tomatoes. PIGINet significantly reduced planning time by 80 percent in simpler scenarios and 20–50 percent in more complex scenarios with longer plan sequences and less training data.
“Systems like PIGINet, which use the power of data-driven methods to efficiently solve familiar cases, yet can use ‘first-principles’ planning methods to test learning-based propositions and solve new problems, offer the best of both. worlds that provide reliable and efficient general-purpose solutions to a wide variety of problems,” says MIT Professor and CSAIL Principal Investigator Leslie Pak Kaelbling.
PIGINet’s use of multimodal embeddings in the input sequence allows for a better representation and understanding of complex geometric relationships. The use of image data helped the model to understand spatial arrangements and configurations of objects, without knowledge of the object’s 3D mesh, for accurate collision checks, allowing for quick decisions in different environments.
One of the main challenges in developing PIGINet was the lack of good training data, as all feasible and infeasible plans must be generated by traditional planners, which is slow in the first place. However, by using pre-trained vision language models and data augmentation tricks, the team was able to overcome this challenge, demonstrating impressive plan time reductions not only on problems related to seen objects, but zero generalization to previously unseen objects.
“Because everyone’s home is different, robots need to be adaptive problem solvers rather than just recipe followers.” Our main idea is to enable a general-purpose task planner to generate candidate task plans and use a deep learning model to select promising plans. The result is a more efficient, adaptable and practical household robot that can conveniently navigate even complex and dynamic environments. Moreover, the practical applications of PIGINet are not limited to households,” says Zhutian Yang, PhD student at MIT CSAIL and lead author of the paper. “Our future goal is to further refine PIGINet to suggest alternative task plans after detecting uncompleted actions, further accelerating the generation of executable task plans without the need for large data sets to train a general-purpose scheduler from scratch.” We believe this could revolutionize the way robots are developed and then used in everyone’s homes.”
“This paper addresses a fundamental challenge in implementing a general-purpose robot: how to learn from past experience to speed up the decision-making process in an unstructured environment filled with a large number of articulated and moving obstacles,” says Beomjun Kim PhD ’20. , assistant professor at the Graduate School of AI at the Korea Advanced Institute of Science and Technology (KAIST). “The main hurdle in such problems is how to define a high-level task plan such that there is a low-level motion plan that executes the high-level plan. Typically, you have to trade off between traffic and task scheduling, which leads to significant computational inefficiencies. Zhutian’s work combats this by using learning to eliminate uncompleted task plans and is a step in a promising direction.”
Yang co-authored the paper with NVIDIA research scientist Caelan Garrett SB ’15, MEng ’15, PhD ’21; MIT Department of Electrical Engineering and Computer Science professors and CSAIL members Tomás Lozano-Perez and Leslie Kaelbling; and Senior Director of Robotics Research at NVIDIA and University of Washington Professor Dieter Fox. The team was supported by AI Singapore and grants from the National Science Foundation, the Air Force Office of Scientific Research, and the Army Research Office. This project was partially realized while Yang was an intern at NVIDIA Research. Their research will be presented at the Robotics: Science and Systems conference in July.
[ad_2]
Source link