Detailed Image Captioning
Abstract
While researchers have made great improvement on generating syn-tactically correct sentences by learning from large image-sentence paired datasets, generating semantically rich and controllable content has remained a major challenge. In image captioning, sequential models are preferred where fluency is an important factor in evaluation,e.g.,n-grammetrics; however, sequential models generally result in over-generalized expressions that lack the details which may be present in an input image and offer no controllability. In this article, we propose two models to tackle this challenge from different perspective. In the first experiment, we aim to generate more detailed captions by incorporating compositional components into a sequential model. In the second experiment, we explore an attribute-based model with the ability to include selected tag words into a target sentence.
BibTeX
@mastersthesis{Tian-2019-113660,author = {Junjiao Tian},
title = {Detailed Image Captioning},
year = {2019},
month = {July},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-34},
keywords = {Image Captioning, Natural Language Processing},
}