1:00 pm to 12:00 am
Event Location: NSH 1507
Abstract: A recurrent and elementary machine perception task is to localize objects of interest in the physical world, be it objects on a warehouse shelf or cars on a road. In many real-world examples, this task entails localizing specific object instances with known 3D models. For example, a warehouse robot equipped with a depth sensor is required to recognize and localize objects in a shelf with known inventory, while a low-cost industrial robot might need to localize parts on an assembly line.
Most modern-day methods for the 3D multi-object localization task employ scene-to-model feature matching or regression/classification by learners trained on synthetic or real scenes. While these methods are typically fast in producing a result, they are often brittle, sensitive to occlusions, and depend on the right choice of features and/or training data. This thesis introduces and advocates a deliberative approach, where the multi-object localization task is framed as an optimization over the space of hypothesized scenes. We conjecture that deliberative reasoning–such as understanding inter-object occlusions–is essential to robust perception, and that the role of discriminative algorithms should mainly be to guide this process.
As part of this thesis work so far, we have developed two methods towards this objective: PErception via SeaRCH (PERCH) and Discriminatively-guided Deliberative Perception (D2P). PERCH exploits structure in the optimization over hypothesized scenes to cast it as a tree search over individual object poses, thereby overcoming the computational intractability of joint optimization. D2P extends PERCH by allowing modern statistical learners such as deep neural networks to guide the global search. This is made possible by Multi-Heuristic A* (MHA*) and its extensions, graph search algorithms which we developed for handling multiple, possibly “inadmissible” heuristics. These algorithms allow us to leverage arbitrary learning-based algorithms as heuristics to accelerate search, without compromising on solution quality.
Our experiments with D2P indicate that we can leverage the complementary strengths of fast learning-based methods and deliberative classical search to handle both “hard” (severely occluded) and “easy” portions of a scene by automatically sliding the amount of deliberation required. For easy scenes, the algorithm mostly relies on learning-based methods to save computation, while for harder scenes, it injects more deliberation to gain robustness at the expense of computation time. In addition, to demonstrate the applicability of D2P to real-world perception tasks, we have integrated our method with the Human-Assisted Robotic Picker (HARP)–the system that represented CMU at the 2016 Amazon Picking Challenge.
For the remaining portion of this thesis work, we first propose to study whether D2P can achieve real-time performance, independently of the complexity of the scene. Further, our existing approach assumes that there is no extraneous clutter, and that the objects have only 3 degrees of freedom. In the remainder of this thesis, we aim to relax these assumptions to permit broader applicability of Deliberative Perception.
Committee:Maxim Likhachev, Chair
Martial Hebert
Siddhartha S. Srinivasa
Manuela M. Veloso
Dieter Fox, University of Washington