Humans In Their Natural Habitat: Training AI to Understand People - Robotics Institute Carnegie Mellon University

Humans In Their Natural Habitat: Training AI to Understand People

Gunnar Atli Sigurdsson
PhD Thesis, Tech. Report, CMU-RI-TR-20-23, Robotics Institute, Carnegie Mellon University, August, 2020

Abstract

Computer vision has a great potential to help our daily lives by searching for lost keys, watering flowers or reminding us to take a pill. To succeed with such tasks, computer vision methods need to be trained from real and diverse examples of our daily dynamic scenes. First, we need to give computers insight into our world, and our daily lives. Not just through the charade the we present to the world on social media, but through a genuine look at the most boring, mundane, routine aspects of our lives. But how do we model this data? How do we model information over time? How do we harness the richness and complexity of this data to enable understanding?

To provide a lens through which to look at humans in their mundane lives, we explored techniques for crowdsourcing the creation of this data from hundreds of people in their own homes, and analyzed how humans think about activities along with the best strategies for annotating complex data of this nature. Given this insight into human behaviour, we can start understanding where other vision techniques have trouble, understand how to improve them, and which venues are most promising moving forward.

Once we have this kind of data, we can start building algorithms that harness the unique aspects of this data by learning how human activities change over time, and what activities occur with a recognizable temporal structure. We can harness the data to learn how complete human events generally unfold, such as a snowboarding trip, and apply these models to applied problems such as summarizing photo albums. Finally, we combine ideas from our work to demonstrate how these techniques can be used to collect data and modeling human activities from first and third-person at the same time, and unsupervised concept learning from web videos. We hope this kind of realistic bias may provide new insights that aid robots equipped with our computer vision models operating in the real world.

BibTeX

@phdthesis{Sigurdsson-2020-125296,
author = {Gunnar Atli Sigurdsson},
title = {Humans In Their Natural Habitat: Training AI to Understand People},
year = {2020},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-20-23},
}