Carnegie Mellon University
2:30 pm to 4:00 pm
NSH 4305
Title: Detailed Image Captioning with Hierarchical Attention
Abstract:
Automatic image description is the task of generating a natural sentence which reflects the visual content of an image. A lot of deep learning architectures have been explored in the past few years. While researchers have made great improvement on generating syntactically correct sentences by learning from large image-sentence paired datasets, generating semantically rich content has remained a major challenge. We developed a hierarchical modular attention architecture to incorporate compositional components into a sequential model. We demonstrate superior performance in subcategories, i.e., expressing counts and color with examples and show results of human study on Amazon Mechanical Turk.
Committee:
Jean Oh (advisor, RI)
Louis-Philippe Morency (LTI)
Gunnar Atli Sigurdsson (PhD, RI)