Carnegie Mellon University
Abstract:
Neural architecture search (NAS) is recently proposed to automate the process of designing network architectures. Instead of manually designing network architectures, NAS automatically finds the optimal architecture in a data-driven way. Despite its impressive progress, NAS is still far from being widely adopted as a common paradigm for architecture design in practice. This thesis aims to develop principled NAS methods that can automate the design of neural networks and reduce human efforts in architecture tuning as much as possible. We mainly focus on developing better search algorithms and search spaces, and identifying more scenarios suitable for NAS.
We first present an efficient NAS framework using Bayesian optimization (BO). Specifically, we propose a method to learn an embedding space over the domain of network architectures, which makes it possible to define a kernel function for the architecture domain, a necessary component to applying BO to NAS. Next, we propose a neighborhood-aware NAS formulation to encourage the selection of architectures with strong generalization capability. The proposed formulation is general enough to be applied to various search algorithms, such as random search, reinforcement learning, and differentiable NAS methods. Finally, in a departure from the commonly used convolution-based search space, we propose a search space for spatiotemporal attention cells that use attention operations as the primary building block. We also develop a differentiable formulation of the search space, allowing us to efficiently search for attention cells. The discovered attention cells not only outperform manually designed ones, but also demonstrate strong generalization across different modalities, backbones, or datasets.
We propose the following two directions. First, we propose to automatically search for an encoder-decoder architecture to effectively fuse dense RGB images and sparse LiDAR measurements. We aim to explore the cross-scale fusion of the two modalities with NAS techniques, since manually tuning cross-scale connections is challenging due to the huge design space. Second, we propose to explore a simple alternative paradigm to obtain efficient and accurate models. Specifically, we plan to comprehensively study the efficiency of committee-based models, \ie, ensembles or cascades of models, which are well-known techniques in machine learning. Our initial results demonstrate that committee-based models can be much more efficient than state-of-the-art solitary models for image classification.
Thesis Committee Members:
Kris M. Kitani, Chair
Deva Ramanan
Jeff Schneider
Michael S. Ryoo, Stony Brook University & Google