Answer-Aware Attention on Grounded Question Answering in Images - Robotics Institute Carnegie Mellon University

Answer-Aware Attention on Grounded Question Answering in Images

Junjie Hu, Desai Fan, Shuxin Yao, and Jean Hyaejin Oh
Conference Paper, Proceedings of AAAI '17 Fall Symposium on Natural Communication for Human Robot Collaboration, November, 2017

Abstract

Grounding natural language expressions to visual context in an image is essential to understanding the semantic meaning of an image. Recent attention approaches on the task of grounded question answering in images simply rely on either attention over arbitrary regions in an image or attention over words in a question, which have not exploited the information behind candidate answers when encoding the question. To address this limitation, we propose two Answer-Aware Attention (AAA) models which use attention over candidate answers, i.e., global and local attention over answers, each of which learns an answer-aware summarization vector of a question. Our proposed attention model leverages information from both textual and visual modalities, which boosts the prediction accuracy in the grounded question answering task. Extensive experiments show that our proposed attention model performs comparably to the state-of-the-art mod- els with much fewer learning parameters.

BibTeX

@conference{Oh-2017-103006,
author = {Junjie Hu and Desai Fan and Shuxin Yao and Jean Hyaejin Oh},
title = {Answer-Aware Attention on Grounded Question Answering in Images},
booktitle = {Proceedings of AAAI '17 Fall Symposium on Natural Communication for Human Robot Collaboration},
year = {2017},
month = {November},
keywords = {grounded question answering, attention model},
}