Road Detection and Semantic Segmentation without Strong Human Supervision
Abstract
Recently, convolutional neural networks (CNNs) trained with strong human supervision have shown to achieve state of the art performance for both road detection and semantic segmentation. However, collecting strongly labeled data for both require detailed per-pixel annotations from humans which renders data annotation highly costly and time consuming. Therefore, in this work we propose methods to train a CNN for both of these tasks without using strong human supervision. For road detection , we propose a two-step self-supervised method which does not require any human image annotation. Firstly, we automatically generate road annotations for training using OpenStreetMap, vehicle pose estimation sensors, and camera parameters. Next, we train a fully convolutional network (FCN) for road detection using these annotations. We show that we are able to generate reasonably accurate training annotations on KITTI data-set [14]. We achieve state-of-the-art performance among the methods which do not require human annotation effort. For semantic segmentation, we use image-level tag annotations to learn a dense pixel-level prediction model. These tags indicate the presence or absence of various classes in an image. We propose a novel graph regularized multiple-instance multilabel (G-MIML) loss to train the FCN. The MIML loss encodes the constraints provided by image-level tags. The superpixel level graph over an image encodes the inherent label smoothness assumption. The proposed loss yields state-of-the-art performance on tag-supervised semantic segmentation on PASCAL VOC 2012 [12] data-set.
BibTeX
@mastersthesis{Laddha-2016-5563,author = {Ankit Laddha},
title = {Road Detection and Semantic Segmentation without Strong Human Supervision},
year = {2016},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-16-37},
}