Self-Supervising Occlusions for Vision - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

February

14
Mon
Dinesh Reddy Narapureddy Robotics Institute,
Carnegie Mellon University
Monday, February 14
4:30 pm to 6:30 pm
Self-Supervising Occlusions for Vision

Abstract:
Virtually every scene has occlusions. Even a scene with a single object exhibits self-occlusions – a camera can only view one side of an object (left or right, front or back), or part of the object is outside the field of view. More complex occlusions occur when one or more objects block part(s) of another object. Understanding and dealing with occlusions is hard due to the large variation in the type, number, and extent of occlusions possible in scenes. Even humans cannot accurately segment or predict the contour or shape of the occluded region when the object is occluded. Current large human-annotated datasets cannot capture such a wide range of occlusions. In this thesis, we propose developing computer vision algorithms robust to occlusions using self-supervision. We propose two methodologies for learning such occlusions for data captured in the wild.

The first methodology is to predict occluded regions using multi-view supervision. We introduced a large multi-camera dataset capturing activity in the wild. Using multi-view priors we developed algorithms for accurate 4D reconstruction of moving objects. We use these reconstructions in a bootstrapping framework to infer the content of occluded regions of the image. We show that such supervision helps the network learn better image representations even with large occlusions.

In the second segment, we explored using longitudinal data i.e. videos captured over weeks, months, or even years to supervise occluded regions in an object. We exploit two observations from such longitudinal data. We exploit this real data in a novel way to first automatically mine a large set of unoccluded objects and then composite them in the same views to generate occlusion scenarios. This self-supervision is strong enough for an amodal network to learn the occlusions in real-world images. Further, Traffic is inherently repetitive over long periods and we exploit such repetitive motion as self-supervision to improve reconstruction over time.

For the proposed work (1) We plan on combining the two proposed paradigms. i.e. multi-view Constraints with the longitudinal constraints for supervising occlusions (2) Extending the longitudinal supervision to more cameras captured using our framework and improving the uncertainty of predictions (3) and extending the current algorithms to generic objects like animals, construction vehicles, etc in addition to vehicles and people.

Thesis Committee Members:
Srinivasa G. Narasimhan, Chair
Deva Ramanan
Kris Kitani
Yaser Sheikh, Meta Reality labs
Jan-Michael Frahm, UNC, Chapel Hill, and Meta Reality labs

More Information