[ad_1]
Reinforcement learning provides a conceptual framework for autonomous agents to learn from experience, similar to how a pet can be trained with a treat. But practical applications of reinforcement learning are often far from natural: instead of using RL to learn by trial and error by actually trying the desired task, typical RL applications use a separate (usually simulated) learning phase. For example, AlphaGo didn’t learn to play Go by competing against thousands of people, but by playing against itself in a simulation. Although this type of simulation training is attractive for games where the rules are well known, applying this to real-world applications such as robotics may require a range of complex approaches, such as using simulated data or instrumenting real environments in different fields. Ways of teaching in laboratory conditions. Can we instead develop augmented learning systems for robots that allow them to learn directly “on the job” while performing the tasks they are asked to do? In this blog post, we discuss ReLMM, a system we developed that learns to clean a room directly with a real robot through continuous learning.
We evaluate our method on different problems of complexity. The upper left task has uniform white spots with no obstacles, while the other rooms have objects of different shapes and colors, obstacles that increase the difficulty of navigation, and blurred objects and patterned mats that make it difficult to see objects in place.
To incorporate real-world “on-the-job” training, the difficulty of gathering more experience is prohibitive. If we can simplify real-world training, making the data collection process more autonomous without the need for human monitoring or intervention, we can take advantage of the ease with which agents learn from experience. In this work, we develop an on-the-job mobile robot learning system for cleaning by learning to grasp objects in different rooms.
People are not born one day and interviewed the next. There are many levels of tasks that people learn before applying for a job because we start with simpler tasks and build up. In ReLMM, we use this concept by allowing robots to train common multi-use skills, such as grasping, first by encouraging the robot to prioritize those skills before learning later skills, such as navigation. Learning this way has two advantages for robotics. The first advantage is that when an agent focuses on learning a skill, it is more efficient at gathering data about the local state distribution for that skill.
This is shown in the figure above, where we have estimated the amount of priority experience needed to achieve effective mobile manipulation training. A second advantage of the multi-level learning approach is that we can test the models trained for different tasks and ask them questions like, “Can you understand something now,” which is useful for navigation training, which we describe next.
Teaching this multi-level policy was not only more efficient than learning both skills simultaneously, but it also allowed the controller to inform the navigation policy. Having a model that estimates the uncertainty in its capture (ours above) can be used to improve navigation searches by skipping seats without seating facilities, as opposed to The uncertainty bonus which does not use this information. The model can also be used to relabel the data during training so that in the unfortunate event that the capture model fails to find an object within its reach, the capture policy can still provide some signal that the object was there, but captured. Politics has not yet learned how to grasp it. Moreover, learning modular models has engineering advantages. Modular training enables re-use of skills that are easier to learn and can build intelligent systems simultaneously. This is useful for many reasons, including security assessment and understanding.
Many of the robotics tasks we see today can be solved with varying degrees of success using manually designed controllers. For our room cleaning task, we created a hand-crafted controller that locates objects using image clustering and returns to the nearest detected object at each step. This expertly designed controller works very well on visually distinctive ball socks and follows reasonable paths around obstacles. But it can’t learn the optimal way to collect objects quickly and struggles with visually diverse rooms.. As shown in video 3 below, the script policy is distracted by the white patterned carpet when trying to capture more white objects.
1)
2)
3)
4)
We show a comparison between (1) our policy at the beginning of training (2) our policy at the end of training (3) the scripted policy. (4) We see the robot’s performance improve over time and eventually outperform the scripted policy in rapidly collecting objects in the room.
Given that we can use experts to code this hand-crafted controller, what is the learning objective? An important limitation of hand-designed controllers is that they are tailored for a specific task, such as capturing white objects. When different objects that differ in color and shape are introduced, the original adjustment may no longer be optimal. Instead of requiring further manual engineering, our learning-based method can adapt to different tasks by gathering its own experience.
However, the most important lesson is that even if a manually designed controller is capable, the learning agent will eventually outperform it given enough time. This learning process itself is autonomous and occurs while the robot is doing its job, making it relatively inexpensive. This shows the capabilities of learning agents, which can also be considered as the development of a general way to perform an “expert manual tuning” process for any kind of task. Learning systems have the ability to create the entire robot control algorithm and are not limited to adjusting a few parameters of the script. A key step in this work is allowing these real-world learning systems to independently collect the data needed to make learning methods successful.
This post is based on the paper “Fully Autonomous Real-World Reinforcement Learning with Mobile Manipulation Applications” presented at CoRL 2021. More details can be found in our paper, on our website and in the video. We provide code to reproduce our experiments. We thank Sergey Levin for valuable feedback on this blog post.
[ad_2]
Source link