Learning to Learn From Simulation: Using simulations to learn faster on robots
Abstract
Learning for control is capable of acquiring controllers in novel task scenarios, paving the path to autonomous robots. However, even with recent advances in the field, the problem of automatically designing and learning controllers for complex robots remains a challenging problem. Typical learning approaches can be prohibitively expensive in terms of robot experiments, and policies learned in simulation do not transfer, due to modelling inaccuracies. This encourages learning information from simulation that has a higher chance of transferring to hardware. In this thesis, we explore methods that learn from simulation to improve performance on actual robots.
One way to improve sample-efficiency is through parametric expert-designed controllers. Such controllers are popular in robotics but need re-learning of parameters for new task scenarios. In this context, Bayesian optimization has emerged as a promising approach for automatically learning controller parameters. However, when performing Bayesian optimization on hardware for high-dimensional policies, sample-efficiency can still be an issue. We develop an approach that utilizes simulation to map the original parameter space into a domain-informed space. During Bayesian optimization, similarity between controllers is now calculated in this transformed space, thus informing distances on hardware with behavior in simulation. We propose two ways of building this transformation, hand-designed features based on knowledge of human walking and using neural networks to extract this information automatically. Our hardware experiments on the ATRIAS robot, and simulation experiments on a 7-link biped model, show that these feature transforms capture important aspects of walking and accelerate learning on hardware and perturbed simulation, as compared to traditional Bayesian optimization and other learning methods.
Another question arises: What if the simulation significantly differs from hardware? To answer this, we create increasingly approximate simulators and study the effect of increasing simulation-hardware mismatch on the performance of Bayesian optimization. We also compare our approach to other approaches from literature, and find it to be more reliable, especially in cases of high mismatch.
An alternative to directly optimizing policies on hardware is to learn robust policies in simulation that can be implemented on hardware. We study the effect of different policy structures on the robustness of very high-dimensional neural network policies. Our experiments on the ATRIAS robot show that neural network policies with an expert-designed structure have a higher rate of transfer between simulation and hardware than unstructured policies.
BibTeX
@phdthesis{Rai-2018-110236,author = {Akshara Rai},
title = {Learning to Learn From Simulation: Using simulations to learn faster on robots},
year = {2018},
month = {November},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-18-65},
keywords = {Bipedal locomotion, transfer learning, Bayesian optimization, reinforcement learning},
}