PhD Thesis Defense
Carnegie Mellon University
Self-Improving 3D Scene Representations
Abstract: Most computer vision models in deployment today are not continually learning. Instead, they are in a “test” mode, where they will behave the same way perpetually, until they are replaced by newer models. This is a problem, because it means the models may perform poorly as soon as their “test” environment diverges from their [...]
Carnegie Mellon University
Direct-drive Hands: Making Robot Hands Transparent and Reactive to Contacts
Abstract: Industrial manipulators and end-effectors are a vital driver of the automation revolution. These robot hands, designed to reject disturbances with stiffness and strength, are inferior to their human counterparts. Human hands are dexterous and nimble effectors capable of a variety of interactions with the environment. Through this thesis we wish to answer a question: [...]
Carnegie Mellon University
Resource-Constrained Learning and Inference for Visual Perception
Abstract: We have witnessed rapid advancement across major computer vision benchmarks over the past years. However, the top solutions' hidden computation cost prevents them from being practically deployable. For example, training large models until convergence may be prohibitively expensive in practice, and autonomous driving or augmented reality may require a reaction time that rivals that [...]
Carnegie Mellon University
Physical Interaction and Manipulation of the Environment using Aerial Robots
Abstract: The physical interaction of aerial robots with their environment has countless potential applications and is an emerging area with many open challenges. Fully-actuated multirotors have been introduced to tackle some of these challenges. They provide complete control over position and orientation and eliminate the need for attaching a multi-DoF manipulation arm to the robot. [...]
Carnegie Mellon University
Visual Representation and Recognition without Human Supervision
Abstract: The advent of deep learning based artificial perception models has revolutionized the field of computer vision. These methods take advantage of the ever growing computational capacity of machines and the abundance of human-annotated data to build supervised learners for a wide-range of visual tasks. However, the reliance on human-annotated is also a bottleneck for [...]
Carnegie Mellon University
Learning Multi-Modal Navigation in Unstructured Environments
Abstract: A robot that operates efficiently in a team with humans in an unstructured outdoor environment must translate commands into actions from a modality intuitive to its operator. The robot must be able to perceive the world as humans do so that the actions taken by the robot reflect the nuances of natural language and [...]
Carnegie Mellon University
Towards Modular and Differentiable Autonomous Driving
Abstract: The classical "modular and cascaded" autonomy stack (object detection, tracking, trajectory prediction, then planning and control) has been widely used for interactive autonomous systems such as self-driving cars due to its interpretability and fast development cycle. In this thesis, we advocate the use of such a modular stack but improve its accuracy and robustness [...]
Carnegie Mellon University
Control Input and Natural Gaze for Goal Prediction in Shared Control
Abstract: Teleoperated systems are used widely in deployed robots today, for such tasks as space exploration, disaster recovery, or assisted manipulation. However, teleoperated systems are difficult to control, especially when performing high-dimensional, contact-rich tasks like manipulation. One approach to ease teleoperated manipulation is shared control; this strategy combines the user's direct control input with an [...]
Carnegie Mellon University
Liquid Metal Actuators
Abstract: This thesis contributes to the field of soft actuators by introducing a generalized framework of actuators from liquid metals. The evolution of robotic actuators has enabled robots to achieve a diversity of motions. Like natural muscles, which converts chemical energy into mechanical work in response to electrical stimuli from the nervous system, actuators are [...]
Carnegie Mellon University
Learning Structured World Model for Deformable Object Manipulation
Abstract: Manipulation of deformable objects challenges common assumptions in robotic manipulation, such as low-dimension state representation, known dynamics, and minimal occlusion. Deformable objects have high intrinsic state representation, complex dynamics with high degrees of freedom, and severe self-occlusion. These properties make them difficult for state estimation and planning. In this thesis, we introduce benchmarks and [...]
Carnegie Mellon University
Object Pose Estimation without Direct Supervision
Abstract: Currently, robot manipulation is a special purpose tool, restricted to isolated environments with a fixed set of objects. In order to make robot manipulation more general, robots need to be able to perceive and interact with a large number of objects in cluttered scenes. Traditionally, object pose has been used as a representation to [...]
Carnegie Mellon University
Heuristic Search Based Planning by Minimizing Anticipated Search Efforts
Abstract: We focus on relatively low dimensional robot motion planning problems, such as planning for navigation of a self-driving vehicle, unmanned aerial vehicles (UAVs), and footstep planning for humanoids. In these problems, there is a need for fast planning, potentially compromising the solution quality. Often, we want to plan fast but are also interested in [...]
Carnegie Mellon University
Accelerating Numerical Methods for Optimal Control
Abstract: Many modern control methods, such as model-predictive control, rely heavily on solving optimization problems in real time. In particular, the ability to efficiently solve optimal control problems has enabled many of the recent breakthroughs in achieving highly dynamic behaviors for complex robotic systems. The high computational requirements of these algorithms demand novel algorithms tailor-suited [...]
Carnegie Mellon University
3D Reconstruction using Differential Imaging
Abstract: 3D reconstruction has been at the core of many computer vision applications, including autonomous driving, visual inspection in manufacturing, and augmented and virtual reality (AR/VR). Because monocular 3D sensing is fundamentally ill-posed, many techniques aiming for accurate reconstruction use multiple captures to solve the inverse problem. Depending on the amount of change in these [...]
Learning with Structured Priors for Robust Robot Manipulation
Abstract: Robust and generalizable robots that can autonomously manipulate objects in semi-structured environments can bring material benefits to society. Data-driven learning approaches are crucial for enabling such systems by identifying and exploiting patterns in semi-structured environments, allowing robots to adapt to novel scenarios with minimal human supervision. However, despite significant prior work in learning for [...]
Carnegie Mellon University
Self-Supervising Occlusions For Vision
Abstract: Virtually every scene has occlusions. Even a scene with a single object exhibits self-occlusions - a camera can only view one side of an object (left or right, front or back), or part of the object is outside the field of view. More complex occlusions occur when one or more objects block part(s) of [...]
Carnegie Mellon University
Learning with Diverse Forms of Imperfect and Indirect Supervision
Abstract: Powerful Machine Learning (ML) models trained on large, annotated datasets have driven impressive advances in fields including natural language processing and computer vision. In turn, such developments have led to impactful applications of ML in areas such as healthcare, e-commerce, and predictive maintenance. However, obtaining annotated datasets at the scale required for training high [...]
Computational Interferometric Imaging
Abstract: Imaging systems typically accumulate photons that, as they travel from a light source to a camera, follow multiple different paths and interact with several scene objects. This multi-path accumulation process confounds the information that is available in captured images about the scene and makes using these images to infer properties of scene objects, such [...]
Neural Radiance Fields with LiDAR Maps
Abstract: Maps, as our prior understanding of the environment, play an essential role for many modern robotic applications. The design of maps, in fact, is a non-trivial art of balance between storage and richness. In this thesis, we explored map compression for image-to-LiDAR registration, LiDAR-to-LiDAR map registration, and image-to-SfM map registration, and finally, inspired by [...]
Carnegie Mellon University
System Identification and Control of Multiagent Systems Through Interactions
Abstract: This thesis investigates the problem of inferring the underlying dynamic model of individual agents of a multiagent system (MAS) and using these models to shape the MAS's behavior using robots extrinsic to the MAS. We investigate (a) how an observer can infer the latent task and inter-agent interaction constraints from the agents' motion and [...]
Carnegie Mellon University
Parallelized Search on Graphs with Expensive-to-Compute Edges
Abstract: Search-based planning algorithms enable robots to come up with well-reasoned long-horizon plans to achieve a given task objective. They formulate the problem as a shortest path problem on a graph embedded in the state space of the domain. Much research has been dedicated to achieving greater planning speeds to enable robots to respond quickly [...]
Carnegie Mellon University
Visual Dataset Pipeline: From Curation to Long-Tail Learning
Abstract: Computer vision models have proven to be tremendously capable of recognizing and detecting several real-world objects: cars, people, pets. These models are only possible due to a meticulous pipeline where a task and application is first conceived followed by an appropriate dataset curation that collects and labels all necessary data. Commonly, studies are focused [...]
Carnegie Mellon University
Optimization of Small Unmanned Ground Vehicle Design using Reconfigurability, Mobility, and Complexity
Abstract: Unmanned ground vehicles are being deployed in increasingly diverse and complex environments. With modern developments in sensing and planning, the field of ground vehicle mobility presents rich possibilities for mechanical innovations that may be especially relevant for unmanned systems. In particular, reconfigurability may enable vehicles to traverse a wider set of terrains with greater [...]
Carnegie Mellon University
Towards Reconstructing Non-rigidity from Single Camera
Abstract: In this talk we will discuss how to infer 3D from images captured by a single camera, without assuming the target scenes / objects being static. The non-static setting makes our problem ill-posed and challenging to solve, but is vital in practical applications where target-of-interest is non-static. To solve ill-posed problems, the current trend [...]
Large Scale Dense 3D Reconstruction via Sparse Representations
Abstract: Dense 3D scene reconstruction is in high demand today for view synthesis, navigation, and autonomous driving. A practical reconstruction system inputs multi-view scans of the target using RGB-D cameras, LiDARs, or monocular cameras, computes sensor poses, and outputs scene reconstructions. These algorithms are computationally expensive and memory-intensive due to the presence of 3D data. [...]
From Reinforcement Learning to Robot Learning: Leveraging Prior Data and Shared Evaluation
Abstract: Unlike most machine learning applications, robotics involves physical constraints that make off-the-shelf learning challenging. Difficulties in large-scale data collection and training present a major roadblock to applying today’s data-intensive algorithms. Robot learning has an additional roadblock in evaluation: every physical space is different, making results across labs inconsistent. Two common assumptions of the robot [...]
Building 4D Models of Objects and Scenes from Monocular Videos
Abstract: We explore how to infer the time-varying 3D structures of generic, deformable objects, and dynamic scenes from monocular videos. A solution to this problem is essential for virtual reality and robotics applications. However, inferring 4D structures given 2D observations is challenging due to its under-constrained nature. In a casual setup where there is neither [...]