Towards Self-supervised Object Discovery and Tracking

Master's Thesis, Tech. Report, CMU-RI-TR-21-28, Robotics Institute, Carnegie Mellon University, August, 2021

View Publication

Abstract

Object discovery and multiple object tracking (MOT) are two highly interrelated tasks that are known to be fundamental problems in computer vision, and are crucial for video understanding. Most existing methods rely on supervised training with human annotations, which is laborious and expensive. In this thesis, we propose a self-supervised method for detecting and tracking moving objects in unlabelled RGB-D videos. The method begins with classic handcrafted techniques for segmenting objects using motion cues: we estimate optical flow and camera motion, and conservatively segment regions that appear to be moving independently of the background. Treating these initial segments as pseudo-labels, we learn an ensemble of appearance-based 2D and 3D detectors, under heavy data augmentation. We use this ensemble to detect new instances of the “moving” type, even if they are not moving, and add these as new pseudo-labels. Our method is an expectation-maximization algorithm, where in the expectation step we fire all modules and look for agreement among them, and in the maximization step we re-train the modules to improve this agreement. The constraint of ensemble agreement helps combat contamination of the generated pseudo-labels (during the E step), and data augmentation helps the modules generalize to yet-unlabelled data (during the M step). We compare against existing unsupervised object discovery and tracking methods, using challenging videos from from CATER and KITTI, and show strong improvements over the state-of-the-art.

BibTeX

@mastersthesis{Zuo-2021-128975,
author = {Yiming Zuo},
title = {Towards Self-supervised Object Discovery and Tracking},
year = {2021},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-21-28},
keywords = {Machine Learning; Computer Vision; Object Discovery; Object Tracking; Self-supervised Learning},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.