Multimodal Modeling: Learning Beyond Visual Knowledge

Newell-Simon Hall 3305

Abstract:  The computer vision community has embraced the success of learning specialist models by training with a fixed set of predetermined object categories, such as ImageNet or COCO. However, learning only from visual knowledge might hinder the flexibility and generality of visual models, which requires additional labeled data to specify any other visual concept and [...]