Abstract:
In a recovery task one seeks to obtain an estimate of an unknown signal from a set of incomplete measurements. These problems arise in a number of computer vision applications, from image based tasks such as super-resolution and in-painting to 3D reconstruction tasks such as Non-Rigid Structure from Motion and scene flow estimation. Early work made use of forward models, where the signal of interest was given some parametric representation and then at run-time an optimization was performed to fit those parameters to the available measurements. In recent years these techniques have been dominated by feed-forward neural networks which instead leverage large datasets to learn a mapping from the input measurements directly to the estimated signal. However, we argue that the run-time optimization strategy gives an advantage when bringing in domain specific knowledge and is better suited to enforcing constraints on predictions. In this thesis we demonstrate techniques for bringing the run time optimization style inference into the deep learning age and discuss areas where this approach can be superior to feed-forward methods.
First we begin by discussing a hierarchical extension to the classical sparse coding model that mimics the structure of deep neural networks. We show how modern large scale gradient based optimization techniques can be used to learn parameters of this model that are well suited to solving linear inverse problems. We apply our technique to the problems of LiDAR super resolution, JPEG artifact reduction, and 2D to 3D trajectory lifting and demonstrate state of the art performance. Additionally we discuss a simple heuristic for identifying linear inverse problems which are more difficult for convolutional neural networks than for forward models. Then we show how our model can be extended beyond linear problems by applying it to trajectory based NRSfM. Finally we discuss more generally how the run-time optimization approach can better leverage domain knowledge, using the recent Neural Scene Flow Prior as an example. We demonstrate how long term scene flow estimates can be obtained not by changing the model or inference strategy, but by designing an optimization that encodes short term correspondences in a way that would be impossible for feed forward networks.
To conclude, we propose future work which brings in more domain knowledge to scene flow starting with a Neural Velocity Flow model that describes an instantaneous velocity field rather than frame to frame correspondences. We leverage techniques from Neural Ordinary Differential Equations to integrate our velocity field into point trajectories which can then be fit to sparse LiDAR measurements. This parameterization allows us to implicitly enforce the constraint that real motion must be time reversible. We close with a discussion of further domain knowledge that can be brought into this model.
Thesis Committee Members:
Simon Lucey, Co-chair
Deva Ramanan, Co-chair
David Held
Noah Snavely, Cornell University