Shapes and Context: In-the-Wild Image Synthesis & Manipulation

Conference Paper, Proceedings of (CVPR) Computer Vision and Pattern Recognition, pp. 2312 - 2321, June, 2019

Abstract

We introduce a data-driven model for interactively synthesizing in-the-wild images from semantic label input masks. Our approach is dramatically different from recent work in this space, in that we make use of no learning. Instead, our approach uses simple but classic tools for matching scene context, shapes, and parts to a stored library of exemplars. Though simple, this approach has several notable advantages over recent work: (1) because nothing is learned, it is not limited to specific training data distributions (such as cityscapes, facades, or faces); (2) it can synthesize arbitrarily high-resolution images, limited only by the resolution of the exemplar library; (3) by appropriately composing shapes and parts, it can generate an exponentially large set of viable candidate output images (that can say, be interactively searched by a user). We present results on the diverse COCO dataset, significantly outperforming learning-based approaches on standard image synthesis metrics. Finally, we explore user-interaction and user-controllability, demonstrating that our system can be used as a platform for user-driven content creation.

BibTeX

@conference{Bansal-2019-121137,
author = {A. Bansal and Y. Sheikh and D. Ramanan},
title = {Shapes and Context: In-the-Wild Image Synthesis & Manipulation},
booktitle = {Proceedings of (CVPR) Computer Vision and Pattern Recognition},
year = {2019},
month = {June},
pages = {2312 - 2321},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.