Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method

Master's Thesis, Tech. Report, CMU-RI-TR-22-60, Robotics Institute, Carnegie Mellon University, August, 2022

View Publication

Abstract

For robots to be effectively deployed in novel environments and tasks, they must be able to understand the feedback expressed by humans during intervention. This can either correct undesirable behavior or indicate additional preferences. Existing methods either require repeated episodes of interactions or assume prior known reward features, which is data-inefficient and can hardly transfer to new tasks. We relax these assumptions by describing human tasks in terms of object-centric sub- tasks and interpreting physical interventions in relation to specific objects. Our method, Object Preference Adaptation (OPA), is composed of two key stages: 1) pre-training a base policy to produce a wide variety of behaviors, and 2) online-updating only certain weights in the model according to human feedback. The key to our fast, yet simple adaptation is that general interaction dynamics between agents and objects are fixed, and only object-specific preferences are updated. Our adaptation occurs online, requires only one human intervention (one-shot), and produces new behaviors never seen during training. Trained on cheap synthetic data instead of expensive human demonstrations, our policy correctly adapts to human perturbations on realistic tasks in both simulation and on a physical 7DOF robot.

BibTeX

@mastersthesis{Shek-2022-133219,
author = {Alvin Shek},
title = {Learning from Physical Human Feedback: An Object-Centric One-Shot Adaptation Method},
year = {2022},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-22-60},
keywords = {human robot interaction, learning from feedback, deep learning, machine learning, graph neural networks},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.