Leveraging Inexpensive Supervision Signals for Visual Learning
Abstract
The success of deep learning based methods for computer vision comes at a cost. Most deep neural network models require a large corpus of annotated data for supervision. The process of obtaining such data is often time consuming and expensive. For example, the process of collecting bounding box annotations takes 26-42 seconds per box. This requirement poses a hindrance for extending these methods to novel domains. In this thesis, we explore techniques for leveraging inexpensive forms of supervision for visual learning. More specifically, we first propose an approach to learn a pose-encoding visual representation from videos of human actions without any human supervision. We show that the learned representation improves performance for pose estimation and action recognition tasks compared to randomly initialized models. Next, we propose an approach to use freely available web data and inexpensive image-level labels to learn object detectors. We show that web data, while highly noisy and biased, can be effectively used to improve localization of objects in the weak-supervision setting.
BibTeX
@mastersthesis{Prakash-2017-22842,author = {Senthil Purushwalkam Shiva Prakash},
title = {Leveraging Inexpensive Supervision Signals for Visual Learning},
year = {2017},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-17-13},
keywords = {Unsupervised, Weakly Supervised, Object Detection, Pose Estimation, Action Recognition},
}