Loading Events

PhD Thesis Proposal

August

3
Mon
Shervin Javdani Carnegie Mellon University
Monday, August 3
12:00 pm to 12:00 am
Learning Policies for Shared Autonomy

Event Location: NSH 3305

Abstract: In shared autonomy, user input and robot autonomy are combined to control a robot to achieve a goal. Most prior work accomplishes this by augmenting user input with some autonomous strategy for that goal. We take a different viewpoint, treating the user as a policy minimizing some cost function. Our aim is to use autonomous assistance to directly minimize the cost the user incurs.

A key challenge is that the robot does not know a priori which goal the user wants to achieve, and must both predict the user’s intended goal, and assist in achieving that goal. Therefore, our objective is to use assistance to minimize the expected cost for the user’s (unknown) goal. We present a general framework for doing so, using a Partially Observable Markov Decision Process (POMDP) with uncertainty over the user’s goal. At each time step, we estimate a distribution over the user’s goal based on their history of inputs through maximum entropy inverse optimal control. As solving for the optimal cost-minimizing policy is computationally infeasible, we utilize hindsight optimization to select good robot actions.

We implemented this framework with a simple distance-based cost function. In a user study, we compared our method to a standard blending approach. We found that our method enabled users to accomplish tasks more quickly while utilizing less input. However, when asked to rate each system, users were mixed in their assessment, tending towards preferring the blending approach.

To alleviate user dissatisfaction, we propose two key alterations to our system. First, we propose to replace the distance-based cost function with one learned from user demonstrations, making autonomous actions more in-line with the user’s strategy. Second, we propose a method to learn the cost trade-off between user input and autonomy through interactive learning, directly incorporating user satisfaction feedback into our cost function.

In order to make learning for individual users feasible, we propose the use of active learning methods to minimize the burden on the user. We review some of our prior work in this field, and discuss potential extensions to accomplish our proposed cost function learning.

Committee:J. Andrew Bagnell, Chair

Siddhartha S. Srinivasa

Emma Brunskill

Wolfram Burgard, University of Freiburg