Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks
Abstract
We present Occlusion-Net, a framework to predict 2D and 3D locations of occluded keypoints for objects, in a largely self-supervised manner. We use an off-the-shelf detector as input (e.g. MaskRCNN) that is trained only on visible key point annotations. This is the only supervision used in this work. A graph encoder network then explicitly classifies invisible edges and a graph decoder network corrects the occluded keypoint locations from the initial detector. Central to this work is a trifocal tensor loss that provides indirect self-supervision for occluded keypoint locations that are visible in other views of the object. The 2D keypoints are then passed into a 3D graph network that estimates the 3D shape and camera pose using the self-supervised reprojection loss. At test time, Occlusion-Net successfully localizes keypoints in a single view under a diverse set of occlusion settings. We validate our approach on synthetic CAD data as well as a large image set capturing vehicles at many busy city intersections. As an interesting aside, we compare the accuracy of human labels of invisible
keypoints against those predicted by the trifocal tensor
BibTeX
@conference{Narapureddy-2019-118026,author = {N. Dinesh Reddy and Minh Vo and Srinivasa G. Narasimhan},
title = {Occlusion-Net: 2D/3D Occluded Keypoint Localization Using Graph Networks},
booktitle = {Proceedings of (CVPR) Computer Vision and Pattern Recognition},
year = {2019},
month = {June},
pages = {7318 - 7327},
keywords = {occlusion detection, keypoints, car pose},
}