Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks

C. Cao, X. Liu, Y. Yang, Y. Yu, J. Wang, Z. Wang, Y. Huang, W. Xu, D. Ramanan, and T. Huang

Conference Paper, Proceedings of (ICCV) International Conference on Computer Vision, pp. 2956 - 2964, December, 2015

Abstract

While feedforward deep convolutional neural networks (CNNs) have been a great success in computer vision, it is important to note that the human visual cortex generally contains more feedback than feedforward connections. In this paper, we will briefly introduce the background of feedbacks in the human visual cortex, which motivates us to develop a computational feedback mechanism in deep neural networks. In addition to the feedforward inference in traditional neural networks, a feedback loop is introduced to infer the activation status of hidden layer neurons according to the "goal" of the network, e.g., high-level semantic labels. We analogize this mechanism as "Look and Think Twice." The feedback networks help better visualize and understand how deep neural networks work, and capture visual attention on expected objects, even in images with cluttered background and multiple objects. Experiments on ImageNet dataset demonstrate its effectiveness in solving tasks such as image classification and object localization.

BibTeX

@conference{Cao-2015-121183,
author = {C. Cao and X. Liu and Y. Yang and Y. Yu and J. Wang and Z. Wang and Y. Huang and W. Xu and D. Ramanan and T. Huang},
title = {Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks},
booktitle = {Proceedings of (ICCV) International Conference on Computer Vision},
year = {2015},
month = {December},
pages = {2956 - 2964},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.