PhD Thesis Defense
Carnegie Mellon University
Light Sheet Depth Imaging
Abstract: Once confined to industrial manufacturing facilities and research labs, robots are increasingly entering everyday life. As specialized robots are developed for tasks such as autonomous driving, package delivery, and aerial videography, there is a growing need for affordable depth sensing technology. Robots use sensors like scanning LIDAR, depth cameras, and passive stereo cameras to [...]
Carnegie Mellon University
Towards Generalization and Efficiency in Reinforcement Learning
Abstract: In classic supervised machine learning, a learning agent behaves as a passive observer: it receives examples from some external environment which it has no control over and then makes predictions. Reinforcement Learning (RL), on the other hand, is fundamentally interactive: an autonomous agent must learn how to behave in an unknown and possibly hostile [...]
Carnegie Mellon University
Planning under Uncertainty with Multiple Heuristics
Abstract: Many robotic tasks, such as mobile manipulation, often require interaction with unstructured environments and are subject to imperfect sensing and actuation. This brings substantial uncertainty into the problems. Reasoning under this uncertainty can provide higher level of robustness but is computationally significantly more challenging. More specifically, sequential decision making under motion and sensing uncertainty [...]
Carnegie Mellon University
Analysis of Spatio-Temporally Varying Features in Optical Coherence Tomographic (OCT) and Ultrasound (US) Image Sequences
Abstract: Optical Coherence Tomography (OCT) and Ultrasound (US) are non-ionizing and non-invasive imaging modalities that are clinically used to visualize anatomical structures in the body. OCT has been widely adopted in clinical ophthalmology due to its micron-scale resolution to visualize in-vivo structures of the eye. Ultra-High Frequency Ultrasound (UHFUS) captures images of tissue at a [...]
Carnegie Mellon University
Spatiotemporal Understanding of People Using Scenes, Objects, and Poses
Abstract: Humans are arguably one of the most important entities that AI systems would need to understand to be useful and ubiquitous. From autonomous cars observing pedestrians to assistive robots helping the elderly, a large part of this understanding is focused on recognizing human actions, and potentially, their intentions. Humans themselves are quite good at [...]
Carnegie Mellon University
Deep Non-Rigid Structure from Motion
Abstract: Non-Rigid Structure from Motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from a sequence of images with 2D correspondences. Current NRSfM algorithms are mainly limited within two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. These [...]
Carnegie Mellon University
Data Centric Robot Learning
Abstract: While robotics has made tremendous progress over the last few decades, most success stories are still limited to carefully engineered and precisely modeled environments. Getting these robots to work in the complex and diverse world that we live in has proven to be a difficult challenge. Interestingly, one of the most significant successes in [...]
Carnegie Mellon University
Exploiting Point Motion, Shape Deformation, and Semantic Priors for Dynamic 3D Reconstruction in the Wild
Abstract: With the advent of affordable and high-quality smartphone cameras, any significant events will be massively captured both actively and passively from multiple perspectives. This opens up exciting opportunities for low-cost high-end VFX effects and large scale media analytics. However, automatically organizing large scale visual data and creating a comprehensive 3D scene model is still [...]
Carnegie Mellon University
Learning and Reasoning with Visual Correspondence in Time
Abstract: There is a famous tale in computer vision: Once, a graduate student asked the famous computer vision scientist Takeo Kanade: "What are the three most important problems in computer vision?" Takeo replied: "Correspondence, correspondence, correspondence!" Indeed, even for the most commonly applied Convolutional Neural Networks (ConvNets), they are internally learning representations that lead to [...]
Carnegie Mellon University
Forecasting and Controlling Behavior by Learning from Visual Data
Abstract: Achieving a precise predictive understanding of the future is difficult, yet widely studied in the natural sciences. Significant research activity has been dedicated to building testable models of cause and effect. From a certain view, a perfect predictive model of the universe is the “holy grail”; the ultimate goal of science. If we had [...]