Assistive value alignment using in-situ naturalistic human behaviors
Abstract
As collaborative robots are increasingly deployed in personal environments, such as the home, it is critical they take actions to complete tasks consistent with personal preferences. However, determining personal preferences for completing household chores is challenging. Many household chores, such as setting a table or loading a dishwasher, are sequential and open-vocabulary, creating a landscape of almost endless a priori preferences. Taking assistive actions in this domain means that a robot must first determine someone’s personal preference from within this expansive space. To do this, robots rely on people to communicate information about their preferences.
Communication about preferences is often collected ex situ: A person is presented with an abstract situation with several alternative solutions and gives feedback on which solution they think they would prefer if they were acting in situ. This feedback on the preferred solution, combined with similar responses from multiple people in multiple situations, is then used to train a preference model. These data can be burdensome to collect, are based on ex situ data collection which does not guarantee alignment with in situ preferences, and fails to capture information about changing to preferences that may arise due to the execution of the collaboration.
In this thesis, we argue that robots can provide personalized in situ assistance using observations of naturalistic human behaviors. In other words, robotic assistance can be viewed as a process of value alignment and can be achieved during task execution using observations of naturally occurring goal-directed behaviors. To support this argument, we make five main contributions.
First, we define assistive robotics as a value alignment problem and identify the main components in defining such a problem: the people involved, the space (or environment) in which the interaction takes place, and the relative timing of the robot and collaborative partners’ actions. Second, we introduce a dataset of naturalistic human-robot collaboration behavior collected in a simple collaborative object rearrangement task. Third, we use this data set to highlight the importance of continued personalization in assistive scenarios. Fourth, we present a method for extending these ideas to complex surface rearrangement tasks with naturalistic data using large internet-scale pretrained multi-modal foundation models. Finally, we present a method for continually finetuning these large foundation models using naturalistic in situ behaviors, demonstrating how we can provide seamless robotic assistance from varying sources of in situ human behavior data.
BibTeX
@phdthesis{Newman-2024-143486,author = {Benjamin A Newman},
title = {Assistive value alignment using in-situ naturalistic human behaviors},
year = {2024},
month = {September},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-24-66},
keywords = {assistive robotics, value alignment, preference learning, collaborative adaptation},
}