Abstract:
In this proposal, we show how some classic computer vision tasks can robustly be solved via optimization techniques by using an object representation that is compact and interpretable.
Specifically, we explore the applications and benefits of representing 3D objects with an analytical, algebraic function by building an approximate, ray-based differentiable renderer. Our approximate formulation of the hidden surface problem sacrifices fidelity for utility in the rendering task, producing fast runtimes and high-quality gradient information. Compared to mesh-based differentiable renderers, our method is several times faster. We generate depth maps and silhouettes that are smooth and defined everywhere. Our evaluations of differentiable renderers on classic, geometry-focused vision tasks show that our method is the only one comparable to classic techniques. In pose estimation, our results are significantly better than existing differentiable renderers, and even better than classic baselines. In shape from silhouette, our method performs well using only gradient descent and a per-pixel loss, without any surrogate losses or regularization, while baselines produce major artifacts.
We show examples on naturally collected videos of everyday objects, and propose to further explore both technical approaches and application domains with efficient representations for vision tasks.
Thesis Committee Members:
Martial Hebert, Chair
Christopher G. Atkeson
Deva Ramanan
Jon Barron, Google