Abstract:
In many practical applications of reinforcement learning (RL), it is expensive to observe state transitions from the environment. In the problem of plasma control for nuclear fusion, the motivating example of this thesis, determining the next state for a given state-action pair requires querying an expensive transition function which can lead to many hours of computer simulation or dollars of scientific research. Such expensive data collection prohibits application of standard RL algorithms which usually require a large number of observations to learn. In this thesis, I address the problem of efficiently learning a policy from a relatively modest number of observations, motivated by the application of automated decision making and control to nuclear fusion. The first section presents four approaches developed to evaluate the prospective value of data in learning a good policy and discusses their performance, guarantees, and application. These approaches address the problem through the lenses of information theory, decision theory, the optimistic value gap, and learning from comparative feedback. We apply this last method to reinforcement learning from human feedback for the alignment of large language models. The second presents work which uses physical prior knowledge about the dynamics to more quickly learn an accurate model. Finally, I give an introduction to the problem setting of nuclear fusion, present recent work optimizing the design of plasma current rampdowns at the DIII-D tokamak, and discuss future applications of AI in fusion.
Thesis Committee Members:
Jeff Schneider, Chair
Deepak Pathak
David Held
Stefano Ermon, Stanford University
Mark D. Boyer, Commonwealth Fusion Systems