Multi-Scale Convolutional Architecture for Semantic Segmentation - Robotics Institute Carnegie Mellon University

Multi-Scale Convolutional Architecture for Semantic Segmentation

Tech. Report, CMU-RI-TR-15-21, Robotics Institute, Carnegie Mellon University, pp. 14, October, 2015

Abstract

Advances in 3D sensing technologies have made the availability of RGB and Depth information easier than earlier which can greatly assist in the semantic segmentation of 2D scenes. There are many works in literature that perform semantic segmentation in such scenes, but few relates to the environment that possesses a high degree of clutter in general e.g. indoor scenes. In this paper, we explore the use of depth information along with RGB and deep convolutional network for indoor scene understanding through semantic labeling. Our work exploits the geocentric encoding of a depth image and uses a multi-scale deep convolutional neural network architecture that captures high and lowlevel features of a scene to generate rich semantic labels. We apply our method on indoor RGBD images from NYUD2 dataset and achieve a competitive performance of 70.45 % accuracy in labeling four object classes compared with some prior approaches. The results show our system is capable of generating a pixel-map directly from an input image where each pixel-value corresponds to a particular class of object.

BibTeX

@techreport{Raj-2015-6037,
author = {Aman Raj and Daniel Maturana and Sebastian Scherer},
title = {Multi-Scale Convolutional Architecture for Semantic Segmentation},
year = {2015},
month = {October},
institute = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-15-21},
}