[ad_1]
Someone learning to play tennis may want to hire a tutor to help them learn faster. Because that teacher is (hopefully) a great tennis player, there are times when trying to imitate the teacher exactly won’t help the student learn. Perhaps the teacher jumps high in the air to deftly return the volleyball. A student who can’t copy this may instead try a few other moves on their own until they master the skills they need to return the volleyball.
Computer scientists can also use “tutor” systems to train another machine to perform a task. But just as with human learning, the student machine faces the dilemma of knowing when to follow the teacher and when to explore on its own. To this end, researchers at MIT and the Technion, the Israel Institute of Technology, developed an algorithm that automatically and independently determines when a student should imitate the teacher (known as imitation learning) and when he should learn by trial and error (known as reinforcement learning). .
Their dynamic approach allows the learner to vary from copying the teacher when the teacher is either too good or not good enough, but then to return to the teacher at a later stage in the learning process if doing so results in better results and faster learning.
When the researchers tested this approach in simulations, they found that their combination of trial-and-error learning and imitation learning allowed students to learn tasks more effectively than methods that used only one type of learning.
The method could help researchers improve the training process for machines deployed in uncertain real-world situations, such as a robot being trained to navigate a building it has never seen before.
“This combination of learning by trial and error and following the teacher is very powerful. This gives our algorithm the ability to solve very complex problems that cannot be solved using either technique individually,” says Idan Schönfeld, a graduate student in Electrical Engineering and Computer Science (EECS) and lead author of a paper on the technique.
Schoenfeld wrote the paper with co-authors Zhang-Wei Hong, EECS graduate student; Aviv Tamar; Technion Assistant Professor of Electrical Engineering and Computer Science; and senior author Pulkit Agrawal, director of the Improbable AI Lab and assistant professor in the Computer Science and Artificial Intelligence Lab. The research will be presented at the International Conference on Machine Learning.
Establishing a balance
Many existing methods that attempt to strike a balance between imitation learning and reinforcement learning do so through brute force trial and error. Researchers choose a weighted combination of the two learning methods, run the entire training procedure, and then repeat the process until they find the optimal balance. This is inefficient and often so computationally expensive that it is not even possible.
“We want principled algorithms, as few knobs to adjust as possible and high efficiency – these principles drove our research,” says Agrawal.
To achieve this goal, the team approached the problem differently than previous work. Their solution involves training two learners: one with a weighted combination of reinforcement learning and imitation learning, and another that can only use reinforcement learning to learn the same task.
The main idea is to automatically and dynamically adjust the weighting of the first learner’s reinforcement and imitation learning goals. This is where the second student comes into play. The researchers’ algorithm constantly compares two students. If the one that uses the teacher does better, the algorithm puts more emphasis on imitation learning to train the student, but if the one that uses only trial and error starts to do better, it will focus more on learning reinforcement learning.
By dynamically determining which method gives you better results, the algorithm is adaptive and can choose the best technique during the learning process. Thanks to this innovation, it can teach students more effectively than other methods that are not adapted, says Schoenfeld.
“One of the main challenges in developing this algorithm was that it took us a while to realize that we shouldn’t train two students independently. It became clear that we needed to connect agents so they could share information and then find the right way to back up that intuition technically,” says Schoenfeld.
Solving difficult problems
To test their approach, the researchers set up a series of mock teacher-student training experiments, such as navigating a lava maze to reach the other corner of the network. In this case, the teacher has the entire grid map, while the student can only see the patch in front of him. Their algorithm achieved almost perfect success in all testing environments and was much faster than other methods.
To give their algorithm an even tougher test, they created a simulation involving a robotic hand with touch sensors but no vision that would have to reposition the pen to the correct pose. The teacher had access to the actual orientation of the pen, while the student could only use touch sensors to determine the orientation of the pen.
Their method outperformed others that used only imitation learning or only reinforcement learning.
Looking at objects is one of the many manipulative tasks that future home robots will need to perform, a vision that the Improbable AI lab is working toward, Agrawal adds.
Teacher-student learning has been successfully used to train robots to manipulate and move complex objects in simulation and then transfer the skills learned to the real world. In these methods, the teacher has privileged information that will be available as a result of the simulation, which the student will not have when he is placed in the real world. For example, a teacher will know a detailed map of a building that a student will teach the robot to navigate using only images captured by its camera.
“Current methods of student-teacher learning in robotics do not account for the student’s inability to imitate the teacher and are therefore limited in performance. The new method paves the way to create superior robots,” says Agrawal.
In addition to better robots, the researchers believe their algorithm has the potential to improve performance in a variety of applications where imitation or reinforcement learning is used. For example, large language models such as GPT-4 are very good at a wide range of tasks, so perhaps a large model can be used as a teacher to train a small, student model to be even “better” at one particular task. . Another interesting direction is to explore the similarities and differences between machines and humans learning from their respective teachers. According to the researchers, such analysis can help improve the learning experience.
“What’s interesting about this approach is how robust it is to different parameter choices and the variety of domains in which it shows promising results,” said Abhishek Gupta, an assistant professor at the University of Washington who was not involved. this job. “Although the current set of results is mainly in simulations, I am very excited about the future possibilities of applying this work to problems of memory and reasoning with different modalities, such as tactile sensing.”
“This work represents an interesting approach to reuse previous computational work in reinforcement learning. In particular, their proposed method can use the teacher’s suboptimal policy as a guide, while avoiding the careful hyperparameter scheduling required to balance the goals of teacher imitation and task optimization,” adds Rishab Agarwal, senior researcher at Google Brain. was also not included in this study. “Hopefully, this work will make the reincarnation of reinforcement learning a learned policy less onerous.”
This research was supported in part by the MIT-IBM Watson AI Lab, Hyundai Motor Company, the DARPA Machine Common Sense Program, and the Office of Naval Research.
[ad_2]
Source link