Loading Events

PhD Thesis Defense

April

21
Thu
Xiaofang Wang Robotics Institute,
Carnegie Mellon University
Thursday, April 21
2:00 pm to 3:00 pm
NSH 4305
Search Algorithms and Search Spaces for Neural Architecture Search

Abstract:
Neural architecture search (NAS) is recently proposed to automate the process of designing network architectures. Instead of manually designing network architectures, NAS automatically finds the optimal architecture in a data-driven way. Despite its impressive progress, NAS is still far from being widely adopted as a common paradigm for architecture design in practice. This thesis aims to develop principled NAS methods that can automate the design of neural networks and reduce human efforts in architecture tuning as much as possible. To achieve this goal, we focus on developing better search algorithms and search spaces, both of which are important for the performance of NAS.

For search algorithms, we first present an efficient NAS framework using Bayesian optimization (BO). Specifically, we propose a method to learn an embedding space over the domain of network architectures, which makes it possible to define a kernel function for the architecture domain, a necessary component to applying BO to NAS. Then, we propose a neighborhood-aware NAS formulation to encourage the selection of architectures with strong generalization capability. The proposed formulation is general enough to be applied to various search algorithms, such as random search, reinforcement learning, and differentiable NAS methods.

For search spaces, we first propose a search space for spatiotemporal attention cells that use attention operations as the primary building block. The attention cells found from our search space not only outperform manually designed ones, but also demonstrate strong generalization across different modalities, backbones, or datasets. Then, we show that committee-based models (ensembles or cascades) are an overlooked design space for efficient models. We find that simply building committees from existing, independently pre-trained models can match or exceed the accuracy of state-of-the-art models while being drastically more efficient. Finally, we point out the importance of controlling the cost in the comparison of different LiDAR-based 3D object detectors. We show that, SECOND, a simple baseline which is generally believed to have been significantly surpassed, can almost match the performance of the state-of-the-art method on the Waymo Open Dataset, if we compare them under the same latency.

Thesis Committee Members:
Kris M. Kitani, Chair
Deva Ramanan
Jeff Schneider
Michael S. Ryoo, Stony Brook University & Google

More Information