Learning to Synthesize and Manipulate Natural Images - Robotics Institute Carnegie Mellon University

Learning to Synthesize and Manipulate Natural Images

Miscellaneous, PhD Thesis, Engineering - Electrical Engineering and Computer Science, UC Berkeley, September, 2017

Abstract

Humans are avid consumers of visual content. Every day, people watch videos, play
digital games and share photos on social media. However, there is an asymmetry –
while everybody is able to consume visual data, only a chosen few are talented enough
to effectively express themselves visually. For the rest of us, most attempts at creating or manipulating realistic visual content end up quickly “falling off” the manifold of natural images. In this thesis, we investigate a number of data-driven approaches for preserving visual realism while creating and manipulating photographs. We use these methods as training wheels for visual content creation. We first propose to model visual realism directly from large-scale natural images. We then define a class of image synthesis and manipulation operations, constraining their outputs to look realistic according to the learned models. The presented methods not only help users easily synthesize more visually appealing photos but also enable new visual effects not possible before this work.

Part I describes discriminative methods for modeling visual realism and photograph aesthetics. Directly training these models requires expensive human judgments.
To address this, we adopt active and unsupervised learning methods to reduce annotation costs. We then apply the learned model to various graphics tasks, such as
automatically generating image composites and choosing the best-looking portraits
from a photo album.

Part II presents approaches that directly model the natural image manifold via
generative models and constrain the output of a photo editing tool to lie on this
manifold. We build real-time data-driven exploration and editing interfaces based
on both simpler image averaging models and more recent deep models.

Part III combines the discriminative learning and generative modeling into an
end-to-end image-to-image translation framework, where a network is trained to
map inputs (such as user sketches) directly to natural looking results. We present
a new algorithm that can learn the translation in the absence of paired training
data, as well as a method for producing diverse outputs given the same input image.
These methods enable many new applications, such as turning user sketches into
photos, season transfer, object transfiguration, photo style transfer, and generating
real photographs from painting and computer graphics renderings.

Notes
ACM SIGGRAPH Outstanding Doctoral Dissertation Award. David J. Sakrison Memorial Prize for outstanding doctoral research, by the UC Berkeley EECS Dept.

BibTeX

@misc{Zhu-2017-125691,
author = {Jun-Yan Zhu},
title = {Learning to Synthesize and Manipulate Natural Images},
booktitle = {PhD Thesis, Engineering - Electrical Engineering and Computer Science, UC Berkeley},
month = {September},
year = {2017},
}