Seminar
Deformable models meet deep learning: supervised and unsupervised approaches
Abstract: In this talk I will be presenting recent work on combining ideas from deformable models with deep learning. I will start by describing DenseReg and DensePose, two recently introduced systems for establishing dense correspondences between 2D images and 3D surface models ``in the wild'', namely in the presence of background, occlusions, and multiple objects. [...]
Building Scalable Framework and Environment of Reinforcement Learning
Abstract: Deep Reinforcement Learning (DRL) has made strong progress in many tasks that are traditionally considered to be difficult, such as complete information games, navigation, architecture search, etc. Although the basic principle of DRL is quite simple and straightforward, to make it work often requires substantially more samples with more computational resource, compared to traditional [...]
Learning Deep Multimodal Features for Reliable and Comprehensive Scene Understanding
Abstract Robust scene understanding is a critical and essential task for autonomous navigation. This problem is heavily influenced by changing environmental conditions that take place throughout the day and across seasons. In order to learn models that are impervious to these factors, mechanisms that intelligently fuse features from complementary modalities and spectra have to be [...]
Scene Understanding
Abstract: Accurate and efficient scene understanding is a fundamental task in a variety of computer vision applications including autonomous driving, human-machine interaction, and robot navigation. Reducing computational complexity and memory use is important to minimize response time and power consumption for portable devices such as robots and virtual/augmented devices. Also, it is beneficial for vehicles [...]
Relating First-person and Third-person Videos
Abstract: Thanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person perspective. Capturing the world from the perspective of one's self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In [...]
Carnegie Mellon University
Learning Reactive Flight Control Policies: from LIDAR measurements to Actions
Abstract The end goal of a reactive flight control pipeline is to output control commands based on local sensor inputs. Classical state estimation and control algorithms break down this problem by first estimating the robot’s velocity and then computing a roll and pitch command based on that velocity. However, this approach is not robust in [...]
Carnegie Mellon University
Autonomous 3D Reconstruction in Underwater Unstructured Scenes
Abstract Reconstruction of marine structures such as pilings underneath piers presents a plethora of interesting challenges. It is one of those tasks better suited to a robot due to harsh underwater environments. Underwater reconstruction typically involves human operators remotely controlling the robot to predetermined way-points based on some prior knowledge of the location and model [...]
Carnegie Mellon University
Wire Detection, Reconstruction, and Avoidance for Unmanned Aerial Vehicles
Abstract Thin objects, such as wires and power lines are one of the most challenging obstacles to detect and avoid for UAVs, and are a cause of numerous accidents each year. This thesis makes contributions in three areas of this domain: wire segmentation, reconstruction, and avoidance. Pixelwise wire detection can be framed as a binary [...]
Carnegie Mellon University
Toward Invariant Visual Inertial State Estimation using Information Sparsification
Abstract In this work, we address two current challenges in real-time visual-inertial odometry (VIO) systems - efficiency and accuracy. To this end, we present a novel approach to tightly couple visual and inertial measurements in a fixed-lag VIO framework using information sparsification. To bound computational complexity, fixed-lag smoothers perform marginalization of variables but consequently deteriorate accuracy and [...]
Imaging the World One Photon at a Time
Abstract: The heart of a camera and one of the pillars for computer vision is the digital photodetector, a device that forms images by collecting billions of photons traveling through the physical world and into the lens of a camera. While the photodetectors used by cellphones or professional DSLR cameras are designed to aggregate as [...]