MSR Thesis Defense
Carnegie Mellon University
Learning with Auxiliary Supervision
Abstract: Supervised learning for high-level vision tasks has advanced significantly over the last decade. One of the primary driving forces for these improvements has been the availability of vast amounts of labeled data. However, annotating data is an expensive and time-consuming process. For example, densely segmenting a natural scene image takes approximately 30 minutes. This mode [...]
Inverse Reinforcement Learning with Conditional Choice Probabilities
Abstract: We make an important connection to existing results in econometrics to describe an alternative formulation of inverse reinforcement learning (IRL). In particular, we describe an algorithm to solve the IRL problem, using easy-to-compute estimates of the Conditional Choice Probability (CCP) vector, which is the policy function of an expert integrated over factors econometricians cannot [...]
Monocular Depth Reconstruction using Geometry and Deep Networks
In this thesis, we explore methods of building dense depth map from monocular video. First, we introduce our multi-view stereo pipeline, which utilizes photometric bundle adjustment for getting accurate depth of textured regions from small motion video. Second, we improve the depth estimation of low-texture region by fusing deep convolutional network predictions. We categorize the [...]
Carnegie Mellon University
Learning Depth from Monocular Videos using Direct Methods
The ability to predict depth from a single image - using recent advances in CNNs - is of increasing interest to the vision community. Unsupervised strategies to learning are particularly appealing as they can utilize much larger and varied monocular video datasets during learning without the need for ground truth depth or stereo. In previous works, separate pose and [...]
Carnegie Mellon University
Learning-based Lane Following and Changing Behaviors for Autonomous Vehicle
This thesis explores learning-based methods in generating human-like lane following and changing behaviors in on-road autonomous driving. We summarize our main contributions as: 1) derive an efficient vision-based end-to-end learning system for on-road driving; 2) propose a novel attention-based learning architecture with sub-action space to obtain lane changing behavior using a deep reinforcement learning algorithm; [...]
Carnegie Mellon University
Real-to-Virtual Domain Unification for End-to-End Autonomous Driving
Abstract: In the spectrum of vision-based autonomous driving, vanilla end-to-end models are not interpretable and suboptimal in performance, while mediated perception models require additional intermediate representations such as segmentation masks or detection bounding boxes, whose annotation can be prohibitively expensive as we move to a larger scale. More critically, all prior works fail to deal with the notorious [...]
Carnegie Mellon University
Reconstruction of dynamic vehicles from multiple unsynchronized cameras
Despite significant research in the area, reconstruction of multiple dynamic rigid objects (eg. vehicles) observed from wide-baseline, uncalibrated and unsynchronized cameras, remains hard. On one hand, feature tracking works well within each view but is hard to correspond across multiple cameras with limited overlap in fields of view or due to occlusions. On the other [...]
Carnegie Mellon University
Ergodic Coverage and Active Search in Constrained Environments
In this thesis, we explore sampling-based trajectory optimization applied to search for objects of interest in constrained environments (e.g., a UAV searching for a target in the presence of obstacles). We consider two search scenarios: in the first scenario, accurate prior information distribution of the possible locations of the objects of interest is available, thus [...]
Carnegie Mellon University
Understanding Machine Vision through Human Vision
Abstract: Recent success in machine vision has been largely driven by advanced computer vision methods, most commonly known as deep learning based methods. While we have seen tremendous performance improvements in machine visual tasks, such as object categorization and segmentation, there remain two major issues in deep learning. Firstly, deep networks have been largely unable [...]
Carnegie Mellon University
Automated design, accessible fabrication, and learning-based control on cable-driven soft robots with complex shapes
The emerging field of soft robots has shown great potential to outperform their rigid counterparts due to the soft and safe nature and the capability of performing complex and compliant motions. Many are built, but the designs are conservative and limited to regular shapes. The widely-used fabrication method contains bulky pumps, tethered tubings, and silicone [...]
What can this robot do? Learning Capability Models from Appearance and Experiments
As autonomous robots become increasingly multifunctional and adaptive, it becomes difficult to determine the extent of their capabilities, i.e. the tasks they can perform and their strengths and limitations at these tasks. A robot's appearance can provide cues to its physical as well as cognitive capabilities. We present an algorithm that builds on these cues [...]
Carnegie Mellon University
Robust State Estimation for Micro Aerial Vehicles
Title: Robust State Estimation for Micro Aerial Vehicles Autonomous robots provide excellent tools for information gathering in a wide variety of domains, from environmental management to infrastructure inspection and search and rescue. Micro aerial vehicles, in particular, offer a high degree of mobil- ity that can further their effectiveness in such environments. Deployment of aerial [...]
Deep Reinforcement Learning with skill library: Learning and exploration with temporal abstractions using coarse approximate dynamics models
Reinforcement learning is a computational approach to learn from interaction. However, learning from scratch using reinforcement learning requires exorbitant number of interactions with the environment even for simple tasks. One way to alleviate the problem is to reuse previously learned skills as done by humans. This thesis provides frameworks and algorithms to build and reuse [...]
Carnegie Mellon University
Semantic Segmentation for Terrain Roughness Estimation Using Data Autolabeled with a Custom Roughness Metric
Traditional methods for off-road terrain estimation use some type of learning network to predict hand labeled classes of terrain such as short grass, tall grass, dirt, and trees. Other methods of learning which can give more detailed, but stilldiscrete classes, use on board sensors to measure the terrain roughness, and then predict the terrain type. There also exists [...]
Carnegie Mellon University
Automated Design of Manipulators For In-Hand Tasks
Grasp planning and motion synthesis for dexterous manipulation tasks are traditionally done given a pre-existing kinematic model for the robotic hand. In this paper, we introduce a framework for automatically designing hand topologies best suited for manipulation tasks given high level objectives as input. Our goal is to ultimately design a program that is able [...]
Learning Neural Parsers with Deterministic Differentiable Imitation Learning
Abstract: In this work, we explore the problem of learning to decompose spatial tasks into segments, as exemplified by the problem of a painting robot covering a large object. Inspired by the ability of classical decision tree algorithms to construct structured partitions of their input spaces, we formulate the problem of decomposing objects into segments [...]
Carnegie Mellon University
Integrating Structure with Deep Reinforcement and Imitation Learning
Most deep reinforcement and imitation learning methods are data-driven and do not utilize the underlying structure of the problem. While these methods have achieved great success on many challenging tasks, several key problems such as generalization, data efficiency and compositionality remain open. Utilizing problem structure in the form of architecture design, priors, domain knowledge etc. may [...]
Carnegie Mellon University
Learning Reactive Flight Control Policies: from LIDAR measurements to Actions
Abstract The end goal of a reactive flight control pipeline is to output control commands based on local sensor inputs. Classical state estimation and control algorithms break down this problem by first estimating the robot’s velocity and then computing a roll and pitch command based on that velocity. However, this approach is not robust in [...]
Carnegie Mellon University
Transparency in Deep Reinforcement Learning Networks
In the recent years there has been a growing interest in the field of Explainability for machine learning models in general and deep learning in particular. This is because, deep learning based approaches have made tremendous progress in the field of computer vision, reinforcement learning, language related domains and are being increasingly used in application areas [...]
Carnegie Mellon University
Geometric approaches to motion planning for two classes of low-Reynolds number swimmers
Microrobots have the potential to impact many areas of medicine such as microsurgery, targeted drug delivery and minimally invasive sensing. Just like microorganisms themselves, microrobots developed for these applications need to swim in a low-Reynolds number regime which warrants locomotive strategies that differ from their macroscopic counterparts. To this end, Purcell’s three-link planar swimmer has [...]
Carnegie Mellon University
Autonomous 3D Reconstruction in Underwater Unstructured Scenes
Abstract Reconstruction of marine structures such as pilings underneath piers presents a plethora of interesting challenges. It is one of those tasks better suited to a robot due to harsh underwater environments. Underwater reconstruction typically involves human operators remotely controlling the robot to predetermined way-points based on some prior knowledge of the location and model [...]
Carnegie Mellon University
Wire Detection, Reconstruction, and Avoidance for Unmanned Aerial Vehicles
Abstract Thin objects, such as wires and power lines are one of the most challenging obstacles to detect and avoid for UAVs, and are a cause of numerous accidents each year. This thesis makes contributions in three areas of this domain: wire segmentation, reconstruction, and avoidance. Pixelwise wire detection can be framed as a binary [...]
The Art of Robotics: Toward a Holistic Approach
I arrived at the Robotics Institute two years ago looking for a good project, something tangible and preferably related to legged locomotion. Instead, I met Matt Mason and started to think about the big picture, ask the big questions. What is manipulation? What is robotics? What makes robotics particularly hard? To answer these questions, I [...]
Carnegie Mellon University
Mapping gamma sources and their flux fields using non-directional flux measurements
There is a compelling need to determine the location and activity of radiation sources from the flux that they generate. There is also a need to create dense flux maps from sparse measurements. This research solves these dual problems. An example of a situation where these capabilities would be vital is at the location of [...]
Carnegie Mellon University
Automated Design of Special Purpose Dexterous Manipulators
Grasp planning and motion synthesis for dexterous manipulation tasks are traditionally done given a pre-existing kinematic model for the robotic hand. In this thesis, we introduce a framework for automatically designing hand topologies best suited for manipulation tasks given high level objectives as input. Our goal is to ultimately design a program that is able [...]
Carnegie Mellon University
Toward Invariant Visual Inertial State Estimation using Information Sparsification
Abstract In this work, we address two current challenges in real-time visual-inertial odometry (VIO) systems - efficiency and accuracy. To this end, we present a novel approach to tightly couple visual and inertial measurements in a fixed-lag VIO framework using information sparsification. To bound computational complexity, fixed-lag smoothers perform marginalization of variables but consequently deteriorate accuracy and [...]
Generative Point Cloud Modeling with Gaussian Mixture Models for Multi-Robot Exploration
Autonomous exploration in rich 3D environments requires the construction and maintenance of a representation derived from accumulated 3D observations. Volumetric models, which are commonly employed to enable joint reasoning about occupied and free space, scale poorly with the size of the environment. Techniques employed to mitigate this scaling include hierarchical discretization, learning local data summarizations [...]
Carnegie Mellon University
Integrating Model-based Planning with Skill learning for Mobile Manipulation
With an ever-growing demand to automate different day-to-day activities, the task of autonomous manipulation using articulated robots has gained serious traction lately. In this regard, motion planning for manipulation is one of the highly researched topics. The Motion planning for manipulation is often cast as either a model-based planning problem or a machine learning problem. However, both of these [...]
Carnegie Mellon University
In-Field Robotic Leaf Grasping and Automated Crop Spectroscopy
Agricultural robotics is a growing field of intelligent automation that is proving to drastically increase the speed and reliability of in-field tasks such as precision seed planting, harvesting, field mapping, and crop monitoring. More specifically, plant breeders are beginning to use robotic systems to record the physical traits of crops throughout the growing season at [...]
Carnegie Mellon University
Soft-matter Artificial Muscle by Electrochemical Surface Oxidation of Liquid Metal
Natural muscles, a result of more than 500 millions years of evolution, are elegant machines that generate force and motion electrochemically. The brief history of robotics does not have the luxury of millions of years to reverse-engineer many aspects of life. The development of artificial muscles therefore seeks to build more muscle-like actuators for robots. [...]
Radiation Source Localization using a Gamma-ray Camera
Radiation source localization is a common and critical task across applications such as nuclear facility decommissioning, radioactive disaster response, and security. Traditional count-based sensors (e.g. Geiger counters) infer range to the source based on the observed number of gamma photons, expected source strength, and assumed intermediate attenuation from the environment. In cluttered 3D settings, such [...]
Carnegie Mellon University
Improving Imitation Learning through Efficient Expert Querying
Learning from demonstration is an intuitive approach to encoding complex behaviors in autonomous agents. Learners have shown success in challenging tasks like autonomous driving, aerial obstacle avoidance, and information gathering, through observation and mimicry alone. State of the art algorithms like Dataset Aggregation (DAgger) have made significant advances over traditional behavior cloning, demonstrating strong theoretical [...]
Failure Is an Option: How the Severity of Robot Errors Affects Human-Robot Interactions
Abstract: Just as humans are imperfect, even the best of robots will eventually fail at performing a task. The likelihood of failure increases as robots expand their roles in our lives. Although task failure is a common problem in robotics and human-robot interaction (HRI), there has been little research investigating human tolerance to said failures, [...]
Carnegie Mellon University
MSR Thesis Talk: Avi Rudich
Title: Kinematic Analysis of 3D Printed Flexible Delta Robots Abstract: Flexible Delta robots show significant promise for use in a wide array of manipulation tasks. They are simple to design and manufacture, and they maintain a high level of repeatability and precision in open loop control. This thesis analyzes the kinematic properties of flexible [...]
Learning Parameter-Efficient Quadrotor Dynamics Models
Abstract: Operation of quadrotors through high-speed, high-acceleration maneuvers remains a challenging problem due to the complex aerodynamics in this regime. While standard physical models suffice for control in near-hover conditions, the primary challenge in executing aggressive trajectories is obtaining a model for the quadrotor dynamics that adequately models the aerodynamic effects present, including lift, drag, [...]
Human-in-the-loop Model Creation
Abstract: Deep generative models make visual content creation more accessible to novice users by automating the synthesis of diverse, realistic content based on a collected dataset. However, the current machine learning approaches miss several elements of the creative process -- the ability to synthesize things that go far beyond the data distribution and everyday experience, [...]
Learning Models and Cost Functions from Unlabeled Data for Off-Road Driving
Abstract: Off-road driving is an important instance of navigation in unstructured environments, which is a key robotics problem with many applications, such as exploration, agriculture, disaster response and defense. The key challenge in off-road driving is to be able to take in high dimensional, multi-modal sensing data and use it to make intelligent decisions on [...]
MSR Thesis Talk: Chonghyuk Song
Title: Total-Recon: Deformable Scene Reconstruction for Embodied View Synthesis Abstract: We explore the task of embodied view synthesis from monocular videos of deformable scenes. Given a minute-long RGBD video of people interacting with their pets, we render the scene from novel camera trajectories derived from in-scene motion of actors: (1) egocentric cameras that simulate the point [...]
MSR Thesis Talk: Shivam Duggal
Title: Learning Single Image 3D Reconstruction from Single-View Image Collections Abstract We present a framework for learning 3D object shapes and dense cross-object 3D correspondences from just an unaligned category-specific image collection. The 3D shapes are generated implicitly as deformations to a category-specific signed distance field and are learned in an unsupervised manner solely from unaligned [...]
MSR Thesis Talk: Himangi Mittal
Title: Audio-Visual State-Aware Representation Learning from Interaction-Rich Data Abstract In robotics and augmented reality, the input to the agent is a long stream of video from the first-person or egocentric point of view. Recently, there have been significant efforts to capture humans from their first-person/egocentric view interacting with their own environment as they go about [...]
MSR Thesis Talk: Ken Liu
Title: On Privacy and Personalization in Federated Learning: Analyses and Applications Abstract: Recent advances in machine learning often rely on large and centralized datasets. However, curating such data can be challenging when they hold private information, and policies/regulations may mandate that they remain distributed across data silos (e.g. mobile devices or hospitals). Federated learning (FL) [...]
Carnegie Mellon University
MSR Thesis Talk: Haolun Zhang
Title: Seeing in 3D: Towards Generalizable 3D Visual Representations for Robotic Manipulation Abstract: Despite the recent progress in computer vision and deep learning, robot perception remains a tremendous challenge due to the variations of the objects and the scenes in manipulation tasks. Ideally, a robot trying to manipulate a new object should be able to [...]
MSR Thesis Talk: Muyang Li
Title: Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models Abstract: During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique [...]
MSR Thesis Talk: Rohan Zeng
Title: Spectral Unmixing and Mapping of Coral Reef Benthic Cover Abstract: Coral reefs are important to the global ecosystem and the local communities and wildlife that rely on the habitat they create. However, coral reefs are also in critical and rapid decline: reefs have degraded over recent decades and what remains is at increasing risk [...]
MSR Thesis Talk: Ashwin Misra
Title: Learn2Plan: Learning variable ordering heuristics for scalable task planning Abstract: Traditional approaches to planning attempt to transform a system into a goal state by applying specific actions in a specific order. In these methods, there is an exponential search space due to considering many possible actions at every decision point. Hierarchical Task Networks use incremental [...]
MSR Thesis Talk: Andrew Jong
Title: Robot Information Gathering for Dynamic Systems in Wildfire Scenarios Abstract: The monitoring of complex dynamic systems, such as those encountered in disaster response, search and rescue, wildlife conservation, and environmental monitoring, presents the fundamental challenge of how to track efficiently with limited resources and partial observability. This thesis presents algorithms and techniques for robotic [...]
MSR Thesis Talk: Erin Wong
Title: Edge Detection by Centimeter Scale Low-Cost Mobile Robots Abstract: In Search and Rescue (SaR) efforts after natural disasters like earthquakes, the primary focus is to find and rescue people in building rubble. These rescue efforts could put first responders at risk and are slow due to the unstable nature of the environment. Robotic solutions [...]
MSR Thesis Talk: Sarvesh Patil
Title: Soft Delta Robots for Dexterous Manipulation Abstract: Dexterous manipulation capabilities of end-effectors afford us a wide range of strategies for fine-grained manipulation tasks. Recent utilization of readily available materials like soft filaments and silicone elastomers has enabled the development of low-cost mechanically intelligent robotic manipulators. This is important for democratizing robot manipulation and increasing [...]
MSR Thesis Talk: Fan Yang
Title: Exploring Safe Reinforcement Learning for Sequential Decision Making Abstract: Safe Reinforcement Learning (RL) focuses on the problem of training a policy to maximize the reward while ensuring safety. It is an important step towards applying RL to safety-critical real-world applications. However, safe RL is challenging due to the trade-off between the two objectives [...]
Long-Tailed 3D Detection via Multi-Modal Fusion
Abstract: Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors, particularly on large-scale LiDAR data. Surprisingly, although semantic class labels naturally follow a long-tailed distribution, these benchmarks focus on only a few common classes (e.g., pedestrian and car) and neglect many rare classes in-the-tail (e.g., debris and stroller). However, in the real [...]
MSR Thesis Talk: Eric Schneider
Title: Phenotyping and Skeletonization for Agricultural Robotics Abstract: Scientific phenotyping of plants is a crucial aspect of experimental plant breeding. By accurately measuring plant characteristics, phenotyping plays a vital role in the development of new plant varieties that are better adapted to specific environments and have improved yield, quality, and resistance to stress and disease. In [...]
MSR Thesis Talk: Shivesh Khaitan
Zoom Link: https://cmu.zoom.us/j/95273358670?pwd=Z09Jc3g1aDV1dTdTMEVUWUwxcUZPQT09 Meeting ID: 952 7335 8670 Passcode: 050721 Title: Exploring Reinforcement Learning approaches for Safety Critical EnvironmentsAbstract: Reinforcement Learning (RL) has emerged as a powerful paradigm for addressing challenging decision-making and robotic control tasks. By leveraging the principles of trial-and-error learning, RL algorithms enable agents to learn optimal strategies through interactions with an environment. However, [...]
MSR Thesis Talk: Ravi Tej Akella
Title: Distributional Distance Classifiers for Goal-Conditioned Reinforcement Learning Abstract: Autonomous systems are increasingly being deployed in stochastic real-world environments. Often, these agents are trying to find the shortest path to a commanded goal. But what does it mean to find the shortest path in stochastic environments, where every strategy has a non-zero probability of failing? At [...]
MSR Thesis Talk: Seth Karten
Title: Emergent Communication and Decision-Making in Multi-Agent Teams Abstract: Explicit communication among humans is key to coordinating and learning. In multi-agent reinforcement learning for partially-observable environments, agents may convey information to others via learned communication, allowing the team to complete its task. However, agents need to be able to communicate more than simply referential messages [...]
MSR Thesis Talk: Sashank Tirumala
Title: Tactile Sensing applied to deformable object manipulation Abstract: The application of robotic manipulation of deformable materials, such as cloth, spans various sectors including fabric manufacturing and domestic laundry management. Historically, most methodologies have employed vision-based sensors as the proprioceptive input to robot policies. However, this study aims to explore an alternate route by leveraging [...]
MSR Thesis Talk: Zhizhu Zhao
Title: Distilling View-conditioned Diffusion for 3D Reconstruction Abstract: We propose a 3D neural mode-seeking formulation that combines probabilistic generation of unseen regions with faithful reprojection of seen regions in a consistent 3D representation. Feature reprojection methods (NerFormer, PixelNeRF) are 3D consistent, but fail to hallucinate unseen regions. Image generation methods (ViewFormer) generate plausible hallucinations, but generated [...]
MSR Thesis Talk: Khiem Vuong
Title: Scaling up Camera Calibration and Amodal 3D Object Reconstruction for Smart Cities Abstract: Smart cities integrate thousands of outdoor cameras to enhance urban infrastructure, but their automated analysis potential remains untapped due to various challenges. Firstly, the lack of accurate camera calibration information, such as its intrinsics parameters and external orientation, restricts the measurement [...]
MSR Thesis Talk: Tianyuan Zhang
Title: Surface Ripples: Analyzing Transient Vibrations on Object's Surfaces Abstract: The subtle vibrations on an object's surface contain information about its physical properties and interaction with the environment. Prior works imaged surface vibration to recover the object's material properties via modal analysis, which discards the transient vibrations propagating immediately after the object is disturbed. In this [...]
MSR Thesis Talk: Anurag Ghosh
Title: Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection Abstract: Real-time efficient perception is critical for autonomous navigation and city scale sensing. Orthogonal to architectural improvements, streaming perception approaches have exploited adaptive sampling improving real-time detection performance. In this work, we propose a learnable geometry-guided prior that incorporates rough geometry of the [...]
MSR Thesis Talk: David Russell
Title: Using Drones and Remote Sensing to Understand Forests with Limited Labeled Data Abstract: Drones and remote sensing can provide observations of forests at scale, but this raw data needs to be interpreted to further scientific understanding and inform effective management decisions. This thesis studies two problems under the realistic constraint of limited domain-specific training [...]
MSR Thesis TallK: Aarrushi Shandilya
Title: Lights, Camera, Render: Neural Fields for Structured Lighting Abstract: 3D scene reconstruction from 2D image supervision alone is an under-constrained problem. Recent neural rendering frameworks have made great strides in learning 3D scene representations to enable novel view synthesis, but they struggle to reconstruct geometry of low-texture regions or from sparse views. The prevalence of active [...]
MSR Thesis Talk: Anirudha Ramesh
Title: Learning to See in the Dark and Beyond Abstract: Robotic Perception in diverse domains such as low-light scenarios remains a challenge, even upon the employment of new sensing modalities like thermal imaging and specialized night-vision sensors. This is largely due to the high difficulty in obtaining labeled data for certain tasks. In this work, [...]
MSR Thesis Talk: Mateo Guaman Castro
Title: Self-Supervised Costmap Learning for Off-Road Vehicle Traversability Abstract: Estimating terrain traversability in off-road environments requires reasoning about complex interaction dynamics between the robot and these terrains. However, it is challenging to build an accurate physics model, or create informative labels to learn a model in a supervised manner, for these interactions. We propose a method [...]
MSR Thesis Talk: Gaoyue Zhou
Title: On Generalization and Benchmarking on Physical Robots Abstract: Robotics research has seen significant advancements; however, the field remains predominantly demo-driven, making direct comparisons between methods difficult without replicating them on individual setups. While many simulation benchmarks exist, they usually feature contrived datasets and do not accurately reflect real-world performance. In my thesis, we [...]
MSR Thesis Talk: Heng Yu
Title: Towards Real-time Controllable Neural Face Avatars Abstract: Neural Radiance Fields (NeRF) are compelling techniques for modeling dynamic 3D scenes from 2D image collections. These volumetric representations would be well suited for synthesizing novel facial expressions but for three problems. First, deformable NeRFs are object agnostic and model holistic movement of the scene: they can [...]
MSR Thesis Talk: Winnie Kuang
Title: Design and Integration of Semantic Mapping System for Forest Fire Mitigation Abstract: Remote sensing technologies can provide an automated approach to monitor and analyze conditions in the forest environment over a period of time for forest maintenance and wildfire mitigation efforts. In particular, unmanned aerial vehicles (UAVs) are a promising remote sensing modality since they [...]
MSR Thesis Talk: Jinqi Luo
Title: Vision Model Diagnosis: A Generative Perspective Abstract: In the evolving landscape of computer vision, deep learning has emerged as a transformative force, enhancing a myriad of societal facets. The real-world deployment of such a deep vision model requires a reliable evaluation, particularly when the model can have different sensitivities across various semantic attributes and concepts. [...]
MSR Thesis Talk: Daphne Chen
Title: Learning Task Preferences from Real-World Data Abstract: In order to provide personalized assistance that is capable of adapting to the needs of unique individuals, it is necessary to understand peoples’ preferences for different tasks. Robot assistance often assumes a static model of the individual, while in the real world, people have different capabilities and needs [...]
MSR Thesis Talk: Prasanna Kettavarapalyam Sriganesh
Title: Fast Staircase Detection and Estimation with Multi-View Merging for Multi-Robot Systems Abstract: When robotic systems are deployed in the real world, they demand advanced mobility capabilities to operate in complex, three-dimensional environments designed for human use, e.g., multi-level buildings. Staircases have been an integral part of facilitating vertical movement in these three-dimensional environments. This work [...]
MSR Thesis Talk: Aarush Gupta
Title: LightSpeed: Light and Fast Neural Light Fields on Mobile Devices Abstract: Real-time novel-view image synthesis on mobile devices is prohibitive due to limited on-device computational power and storage. Using volumetric rendering methods, such as NeRF and its derivatives, on mobile devices is not suitable due to the high computational cost of volumetric rendering. On the [...]
MSR Thesis Talk: Akshaya Kesarimangalam Srinivasan
Title: Multi-agent Multi-objective Ergodic Search Abstract: In order to find points of interest in a given domain, many planners use a priori information to guide the search to expedite the detection of targets. We present an approach to direct multiple agents (MA) to search a given domain subject to multiple objectives (MO), each characterized by its own information [...]
MSR Thesis Talk: Joshua Spisak
Title: Stochastic Optimization for Autonomous Navigation, Leveraging Parallel Computation Abstract: Stochastic Optimal Control (SOC) is a framework that allows disturbances and uncertainty in system models to be accounted for in its optimization framework. Despite accounting for this uncertainty, many first and second order methods for solving SOC problems are subject to local minima and are [...]
MSR Thesis Talk: Xuxin Cheng
Title: Learning Legged Robot Agility: Sim-to-Real and Beyond Abstract: Legged robotics has seen significant advancements in both manipulation and locomotion. However, there remain significant gaps compared to their biological counterparts, particularly in energy efficiency, natural motion, and the capacity for agile skills. This thesis primarily focuses on two aspects: the unified control of legged manipulators [...]
MSR Thesis Talk: Nishant Mohanty
Title: Multi-Robot Control using Control Barrier Functions: Theory and Application Abstract: Control Barrier Functions (CBFs) have emerged as a powerful theoretical tool for designing controllers with provable safety guarantees. This work presents a novel methodology that leverages CBFs to synthesize controllers for multi-robot coordination. Two multi-agent use cases are explored, i.e., a) Non-Cooperative Herding and [...]
MSR Thesis Talk: Yuyao Shi
Title: A Learning Approach to Understand How Spinal Cord Learns Multiple Behaviors Abstract: The spinal cord plays a crucial role in the control of human locomotion, generating motor patterns and coordinating reflex responses to sensory signals. Although this spinal control is traditionally viewed as a simple relay system, more recent neurophysiological evidence points to a [...]
Carnegie Mellon University
MSR Thesis Talk: FNU Abhimanyu
Title: Improving Robotic Ultrasound AI Using Optical Flow Abstract: Ultrasound is an important modality for medical intervention such as vascular access because it is safe, portable, and low-cost. However, ultrasound scanning requires trained sonographers who are scarce, and it can be challenging to perform ultrasound examinations in disaster or battlefield scenarios. This motivates us to automate [...]
MSR Thesis Talk: Lucas Casanova De Oliveira Nogueira
Title: SuperLoop: a LIDAR-based SLAM Back-end for Underground Exploration Abstract: Robots deployed in underground scenarios require a SLAM system that can handle a variety of challenges, such as the absence of GPS, large scale maps, bad illumination, and geometrically degenerate environments. It is nearly impossible for any SLAM solution to handle all these challenges perfectly, specially [...]
MSR Thesis Talk: Neil Khera
Title: PyCubed-Mini: A Low-Cost, Open-Source Satellite Research Platform Abstract: Satellite development has become more accessible with decreasing launch costs and shrinking hardware. However, the expenses associated with pre-built satellite kits remain high, making it difficult for student and hobbyist teams to participate. The lack of standardized satellite hardware and software further adds to the challenge, [...]
Strategy assessment for solving rich physical problems
Abstract: We present a framework that acts as an "intuitive physics reasoner" which takes in strategies expressed in natural language (whether from a human or LLM), and assesses their validity based on a physics knowledge library. We believe the ability to quickly determine whether a strategy is worth considering and allocating further resources to planning [...]
MSR Thesis Talk: Siva Kailas
Title: Multi-Robot Information Gathering for Spatiotemporal Environment Modelling Abstract: Learning to predict or forecast spatiotemporal (ST) environmental processes from a sparse set of samples collected autonomously is a difficult task from both a sampling perspective (collecting the best sparse samples) and from a learning perspective (predicting unseen locations or forecasting the next timestep). We investigate [...]
MRS Thesis Talk: Ruijie Fu
Title: Towards Mechanical Communication in Multi-Agent Locomotive Systems: Principally Kinematic Robots on a Shared Platform Abstract: Many biological multi-agent systems exhibit a mechanism for information exchange among individuals known as mechanical communication, which leads to the emergence of collective behavior within the group. One such example is the swarming behavior of bacteria, where they form rafts [...]
Architecture and Algorithms for Space-Based Global Wildlife Tracking
Abstract: Accurate satellite based positioning revolutionized several industries over the past two decades from agriculture to transportation. However, conventional GNSS receivers consume significant amounts of energy and are too large for many applications, including wildlife-tracking which is critical for conservation efforts and improving our understanding of the global climate. To address this capability gap, we [...]
Language-Conditioned Object Detection and Manipulation
Abstract: Traditional object detection methods are often confined to predefined object vocabularies, limiting their versatility in real-world scenarios where robots need to understand and execute diverse household tasks. Additionally, the 2D and 3D perception communities have typically pursued separate approaches tailored to their respective domains. In this thesis, we present a language-conditioned object detector with [...]
Exploring Diverse Interaction Types for Human in the Loop Robot Learning
Abstract: Teaching sessions between humans and robots will need to be maximally informative for optimal robot learning and to ease the human’s teaching burden. However, the bulk of prior work considers one or two modalities through which a human can convey information to a robot—namely, kinesthetic demonstrations and preference queries. Moreover, people will teach robots [...]
Building Robot Hands and Teaching Dexterity
Abstract: Our shared dream is to have robot humanoids with hands complete similar tasks that humans do. While there are a few robot hands available today, the popular opinion is that they are difficult to use, expensive, and hard to obtain which precludes their ubiquitous usage. We argue that this is not an inherent problem [...]
New Methods for Satellite Control
Abstract: Since 2003, the number of satellites launched into orbit has grown from 100 per year to over 2000 per year. Over that same timeframe, incredible advances have been made in control systems for terrestrial robotics and autonomy. Despite the increased quantity of satellites in orbit and the advances made in terrestrial control systems, satellite [...]
[MSR Thesis Talk] Development and Testing of a Software Stack for an Autonomous Racing Vehicle
Abstract: Autonomous racing aims to replicate the human racecar driver with software and sensors. As in traditional motorsports, Autonomous Racing Vehicles (ARVs) are pushed to their dynamic limits in multi-agent scenarios at high (>= 100mph) speeds. This Operational Design Domain (ODD) presents unique challenges across the autonomy stack. The Indy Autonomous Challenge (IAC) is an [...]
[MSR Thesis Talk] Kitchen Robot Case Studies: Learning Manipulation Tasks from Human Video Demonstrations
Abstract: The vision of integrating a robot into the kitchen, capable of acting as a chef, remains a sought-after goal in robotics. Current robotic systems, mostly programmed for specific tasks, fall short in versatility and adaptability to a diverse culinary environment. While significant progress has been made in robotic learning, with advancements in behavior cloning, [...]
[MSR Thesis Talk] Neural Implicit Representations for Medical Ultrasound Volumes and 3D Anatomy-specific Reconstructions
Abstract: Most Robotic Ultrasound Systems (RUSs) equipped with ultrasound-interpreting algorithms rely on building 3D reconstructions of the entire scanned region or specific anatomies. These 3D reconstructions are typically created via methods that compound or stack 2D tomographic ultrasound images using known poses of the ultrasound transducer with the latter requiring 2D or 3D segmentation. While fast, this class [...]
[MSR Thesis Talk] Enhancing RHex Robot Performance with Innovative Bioplastic Legs Responsive to Humidity
Abstract: Designing and developing robots that can effectively navigate real-world environments poses a significant challenge. To overcome this, many robotic systems draw inspiration from the adaptive behaviors of animals, which have evolved to thrive in diverse surroundings. Amphibious animals, for instance, seamlessly transition between walking and swimming, optimizing their locomotion efficiency based on environmental cues. [...]
Alignment for Vision-Language Foundation Model
Abstract: Recent advancements in vision-language foundation models, exemplified by GPT4-Vision and DALL-E 3, have significantly transformed both research and practical applications, ranging from professional assistance to content creation. However, aligning them precisely with specific user goals presents a notable challenge. This thesis introduces innovative strategies for improving this alignment. I will first introduce our novel [...]
Improving Kalman Filter-based Multi-Object Tracking in Occlusion and Non-linear Motion
Abstract: Modern methods solve multi-object tracking from two perspectives: motion modeling and appearance matching. As a classic paradigm, motion-based tracking by Kalman filters suffers from complicated motion patterns and the problem becomes more difficult when we only have noisy bounding boxes. To improve Kalman filter-based multi-object tracking in scenarios with complex motion, occlusion, and crossover, [...]
Improving Kalman Filter-based Multi-Object Tracking in Occlusion and Non-linear Motion
Abstract: Modern methods solve multi-object tracking from two perspectives: motion modeling and appearance matching. As a classic paradigm, motion-based tracking by Kalman filters suffers from complicated motion patterns and the problem becomes more difficult when we only have noisy bounding boxes. To improve Kalman filter-based multi-object tracking in scenarios with complex motion, occlusion, and crossover, [...]
[MSR Thesis Talk] SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM
Abstract: Dense simultaneous localization and mapping (SLAM) is crucial for numerous robotic and augmented reality applications. However, current methods are often hampered by the non-volumetric or implicit way they represent a scene. This talk introduces SplaTAM, an approach that leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a single unposed RGB-D [...]
Human Perception of Robot Failure and Explanation During a Pick-and-Place Task
Abstract: In recent years, researchers have extensively used non-verbal gestures, such as head and arm movements, to express the robot's intentions and capabilities to humans. Inspired by past research, we investigated how different explanation modalities can aid human understanding and perception of how robots communicate failures and provide explanations during block pick-and-place tasks. Through an in-person [...]
Learning Distributional Models for Relative Placement
Abstract: Relative placement tasks are an important category of tasks in which one object needs to be placed in a desired pose relative to another object. Previous work has shown success in learning relative placement tasks from just a small number of demonstrations, when using relational reasoning networks with geometric inductive biases. However, such methods fail [...]
Transfer Learning via Temporal Contrastive Learning Inbox
Abstract: This thesis introduces a novel transfer learning framework for deep reinforcement learning. The approach automatically combines goal-conditioned policies with temporal contrastive learning to discover meaningful sub-goals. The approach involves pre-training a goal-conditioned agent, finetuning it on the target domain, and using contrastive learning to construct a planning graph that guides the agent via sub-goals. Experiments [...]
Towards Equitable Representation in Text-to-Image Generation
Abstract: Accurate representation in media is known to improve the well-being of the people who consume it. There is a growing concern about the increasing use of generative AI in media as the generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of various groups, [...]
3D Inference from Unposed Sparse View Images
Abstract: We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for generic objects given a sparse set of reference images without corresponding pose information. Current sparse-view 3D inference methods typically rely on camera poses to geometrically aggregate information from input views, but are not robust in-the-wild when such information [...]
Tightly Coupled LIDAR-Inertial Odometry
Abstract: In the age of self-driving, LIDAR and IMU represent two of the most ubiqui- tous sensors in use. Kalman Filtering and loosely coupled approaches dominate industry techniques, while current research trends towards a more tightly coupled formulation involving a joint optimization of IMU and LIDAR measurements. After two years of experience working with and [...]
In Pursuit of Open-World Mobile Manipulation
Abstract: Deploying robots in open-ended unstructured environments such as homes has been a long-standing research problem. However, robots are often studied only in closed-off lab settings, and prior mobile manipulation work is restricted to pick-move-place, which is arguably just the tip of the iceberg in this area. In this thesis, we introduce the Open-World Mobile [...]
Carnegie Mellon University
Geometric Heuristics Enhance POCUS AI for Pneumothorax
Abstract: The interpretation of Point-of-care ultrasound (POCUS) images poses a challenge due to the scarcity of high-quality labelled data for training AI models in the medical domain. To address this limitation, novel methodologies were developed to train POCUS AI models using limited data, integrating geometric heuristics derived from expert clinicians. Focused on diagnosing pneumothorax, heuristics [...]
Optimal Control and Robot Learning on Agile Safety-Critical Systems
Abstract: We present a pipeline of optimal control methods for learning an optimal control policy and locally accurate dynamics models for agile and safety-critical robots using autonomous racing as an application example. We introduce Spline-Opt, a fast offline/online optimization and planning method that can produce a reasonably good initial optimal trajectory given very little dynamics [...]
Vision Model Diagnosis and Improvement Via Large Pretrained Models
Abstract: As AI becomes increasingly pervasive in real-world applications, the deployment of machine learning models in real-world applications has underscored critical challenges in model robustness, fairness and performance. Despite significant advances, existing models often exhibit biases, fail to generalize across diverse data distributions, and struggle with unexpected input variations, leading to suboptimal or even discrimina- [...]
Indoor Localization and Mapping with 4D mmWave Imaging Radar
Abstract: State estimation is a crucial component for the successful implementation of robotic systems, relying on sensors such as cameras, LiDAR, and IMUs. However, in real-world scenarios, the performance of these sensors is degraded by challenging environments, e.g. adverse weather conditions and low-light scenarios. The emerging 4D imaging radar technology is capable of providing robust perception in adverse conditions. [...]
PIE-FRIDA: Personalized Interactive Emotion-Guided Collaborative Human-Robot Art Creation
Abstract: The introduction of generative AI has brought about many improvements in the artistic world. It allows many individuals to create artwork via simple descriptive text prompts. This has, in particular, created an avenue for non-artistic individuals to express their thoughts through generated art. Our work focuses on how emotion can be added as an [...]
Simulated Encounters of the Third Kind: Scenario-Based Approach to Designing Guide Robots
Abstract: Navigating through unfamiliar environments is a challenging task. For people who are blind or have low vision (BLV), navigation can be particularly daunting. Guide robots are a type of service robot that can assist BLV people with navigation tasks. A significant amount of research related to guide robots has focused on technical contributions, while a [...]
Super Odometry: Selective Fusion Towards All-degraded Environments
Abstract: Robust odometry is at the core of robotics and autonomous systems operating navigation, exploration, and locomotion in complex environments for a broad spectrum of applications. While great progress has been made, the robustness of the odometry system still remains a grand challenge. This talk introduces Super Odometry, an approach that leverages selective fusion to [...]
Learning on the Move: Integrating Action and Perception for Mobile Manipulation
Abstract: While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manipulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in unstructured and dynamic environments. While the applications are broad and interesting, there are a plethora [...]
Continual Personalization of Human Actions with Prompt Tuning
Abstract: In interactive computing devices (VR/XR headsets), users interact with the virtual world using hand gestures and body actions. Typically, models deployed in such XR devices are static and limited to their default set of action classes. The goal of our research is to provide users and developers with the capability to personalize their experience by [...]
Reinforcement Learning with Spatial Reasoning for Dexterous Robotic Manipulation
Abstract: Robotic manipulation in unstructured environments requires adaptability and the ability to handle a wide variety of objects and tasks. This thesis presents novel approaches for learning robotic manipulation skills using reinforcement learning (RL) with spatially-grounded action spaces, addressing the challenges of high-dimensional, continuous action spaces and alleviating the need for extensive training data. Our [...]
Leveraging Vision, Force Sensing, and Language Feedback for Deformable Object Manipulation
Deformable object manipulation represents a significant challenge in robotics due to its complex dynamics, lack of low-dimensional state representations, and severe self-occlusions. This challenge is particularly critical in assistive tasks, where safe and effective manipulation of various deformable materials can significantly improve the quality of life for individuals with disabilities and address the growing needs [...]
CBGT-Net: A Neuromimetic Architecture for Robust Classification of Streaming Data
Abstract: This research introduces CBGT-Net, a neural network model inspired by the cortico-basal ganglia-thalamic (CBGT) circuits in mammalian brains, which are crucial for critical thinking and decision-making. Unlike traditional neural network models that generate an output for each input or after a fixed sequence of inputs, CBGT-Net learns to produce an output once sufficient evidence [...]
Enhancing Robot Perception and Interaction Through Structured Domain Knowledge
Abstract: Despite the advancements in deep learning driven by increased computational power and large datasets, significant challenges remain. These include difficulty in handling novel entities, limited mechanisms for human experts to update knowledge, and lack of interpretability, all of which are crucial for human-centric applications like assistive robotics. To address these issues, we propose leveraging [...]
Towards Universal Place Recognition
Title: Towards Universal Place Recognition Abstract: Place Recognition is essential for achieving robust robot localization. However, current state-of-art systems remain environment/domain-specific and fragile. By leveraging insights from vision foundation models, we present AnyLoc, a universal VPR solution that performs across diverse environments without retraining or fine-tuning, significantly outperforming supervised baselines. We further introduce MultiLoc, and enable [...]
GNSS-denied Ground Vehicle Localization for Off-road Environments with Bird’s-eye-view Synthesis
Abstract: Global localization is essential for the smooth navigation of autonomous vehicles. To obtain accurate vehicle states, on-board localization systems typically rely on Global Navigation Satellite System (GNSS) modules for consistent and reliable global positioning. However, GNSS signals can be obstructed by natural or artificial barriers, leading to temporary system failures and degraded state estimation. On the [...]
Scaling up Robot Skill Learning with Generative Simulation
Abstract: Generalist robots need to learn a wide variety of skills to perform diverse tasks across multiple environments. Current robot training pipelines rely on humans to either provide kinesthetic demonstrations or program simulation environments with manually-designed reward functions for reinforcement learning. Such human involvement is an important bottleneck towards scaling up robot learning across diverse [...]
Simulation as a Tool for Conspicuity Measurement
Abstract: The use of unmanned aerial vehicles (UAVs) for time critical tasks is becoming increasingly popular. Operators are expected to use information from these swarms to make real-time and informed decisions. Consequently, detecting and recognizing targets from video is extremely pivotal to the success of these systems. At greater altitudes or with more vehicles, this [...]
VP4D: View Planning for 3D and 4D Scene Understanding
Abstract: View planning plays a critical role by gathering views that optimize scene reconstruction. Such reconstruction has played an important part in virtual production and computer animation, where a 3D map of the film set and motion capture of actors lead to an immersive experience. Current methods use uncertainty estimation in neural rendering of view [...]
Automating Annotation Pipelines by leveraging Multi-Modal Data
Abstract: The era of vision-language models (VLMs) trained on large web-scale datasets challenges conventional formulations of “open-world" perception. In this work, we revisit the task of few-shot object detection (FSOD) in the context of recent foundational VLMs. First, we point out that zero-shot VLMs such as GroundingDINO significantly outperform state-of-the-art few-shot detectors (48 vs. 33 AP) [...]
Leveraging Affordances for Accelerating Online RL
Abstract: The inability to explore environments efficiently makes online RL sample-inefficient. Most existing works tackle this problem in a setting devoid of prior information. However, additional affordances may often be cheaply available at the time of training. These affordances include small quantities of demo data, simulators that can reset to arbitrary states and domain specific [...]
Safe, Robust and Adaptive Model Learning for Agile Robots: Autonomous Racing
Abstract: In recent years there has been a rapid development in agile robots capable of operating at their limits in dynamic environments. Autonomous racing and recent developments in it also spurred by competitions such as the Indy Autonomous Challenge, A2RL, and F1Tenth have shown how modern autonomous control algorithms are capable of operating racecars at [...]
Improving Lego Assembly with Vibro-Tactile Feedback
Abstract: Robotic manipulation is an important area of research to improve the level of efficiency and autonomy in manufacturing processes. Due to the high precision and repeatability of industrial robot arms, robotic manufacturing tasks are dominated by simple pick, place, and peg insertion actions performed in a highly structured environment. Lego blocks are an excellent [...]
DeltaWalker: A Soft, Linearly Actuated Delta Quadruped Robot
Abstract: Quadruped robots offer a versatile solution for navigating complex terrain, making them valuable for applications such as industrial automation or search and rescue. Although quadrupeds are more complex than bipeds, they are easier to balance and control and require fewer joints to actuate compared to hexapods. Traditional quadruped designs, however, often feature complex leg [...]
Propagative Distance Optimization for Constrained Inverse Kinematics
Abstract: This work investigates a constrained inverse kinematic (IK) problem that seeks a feasible configuration of an articulated robot under various constraints such as joint limits and obstacle collision avoidance. Due to the high-dimensionality and complex constraints, this problem is often solved numerically via iterative local optimization. Classic local optimization methods take joint angles as [...]
Advancing Legged Robot Agility: from Video Imitation to GPU Acceleration
Abstract: Achieving human and animal-level agility has been a long-standing goal in robotics research. Recent advancements in numerical optimization and machine learning have pushed legged systems to greater capabilities than ever before, enabling black flips, parkour, and manipulation of heavy objects. Despite these exciting developments, this thesis identifies two key limitations of current legged robot [...]
Model Predictive Control on Resource-Constrained Robots
Abstract: Model predictive control (MPC) is a powerful tool for controlling highly dynamic robotic systems subject to complex constraints. However, it is computationally expensive and often requires a large memory footprint. Larger robotic systems are capable of carrying and powering sophisticated computational hardware onboard. On the other hand, smaller robots typically have faster dynamics that [...]
Enhancing Bipedal Locomotion With Reaction Wheels
Abstract: Legged robot hardware has become more accessible in the last ten years. However, there is still a dearth of low-cost hardware platforms that are open-source and easy to build. With recent developments in accessible manufacturing methods, such as 3D printing, it has become possible to design and manufacture parts without relying on precision machining. [...]
Building Micron: The Next Handheld Manipulator for Microsurgery
Abstract: Robotic assistance is used today in a variety of surgeries as a means of precise, dexterous, and minimally-invasive manipulation. However, practical use in microsurgical environments such as vitreoretinal surgery remains a challenge for the most common mechanically-grounded robotic platforms. Microsurgery requires micron-level accuracy and the ability to manipulate with interaction forces in millinewtons. Vitreoretinal [...]
Towards Estimation, Modeling, and Control of Mixed Material Flows on Variable-Speed Conveyor Belt Systems with Applications in Recycling
Abstract: Whether it is in sorting defects from grain in an agricultural setting, ore from tailings in a mine, or letters in a postal system, the sorting of bulk material has long been a crucial aspect of human industry. Today, in the face of dwindling natural resource deposits and accelerating climate change, a particularly important [...]
Expressive Attentional Communication Learning using Graph Neural Networks
Abstract: Multi-agent reinforcement learning presents unique hurdles such as the non-stationary problem beyond single-agent reinforcement learning that makes learning effective decentralized cooperative policies using an agent's local state extremely challenging. Effective communication to share information and coordinate is vital for agents to work together and solve cooperative tasks, as the ubiquitous evidence of communication in [...]
Estimating Object Importance and Modeling Driver’s Situational Awareness for Intelligent Driving
Abstract: The ability to identify important objects in a complex and dynamic driving environment can help assistive driving systems alert drivers. These assistance systems also require a model of the drivers' situational awareness (SA) (what aspects of the scene they are already aware of) to avoid unnecessary alerts. This thesis builds towards such intelligent driving [...]
Learning for Perception and Strategy: Adaptive Omnidirectional Stereo Vision and Tactical Reinforcement Learning
Abstract: Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number [...]
Online-Adaptive Self-Supervised Learning with Visual Foundation Models for Autonomous Off-Road Driving
Abstract: Autonomous robot navigation in off-road environments currently presents a number of challenges. The lack of structure makes it difficult to handcraft geometry-based heuristics that are robust to the diverse set of scenarios the robot might encounter. Many of the learned methods that work well in urban scenarios require massive amounts of hand-labeled data, but [...]
VoxDet: Voxel Learning for Novel Instance Detection
Abstract: Detecting unseen instances based on multi-view templates is a challenging problem due to its open-world nature. Traditional methodologies, which primarily rely on 2D representations and matching techniques, are often inadequate in handling pose variations and occlusions. To solve this, we introduce VoxDet, a pioneer 3D geometry-aware framework that fully utilizes the strong 3D voxel [...]
Voxel Learning for Novel Instance Detection
Abstract: Detecting unseen instances based on multi-view templates is a challenging problem due to its open-world nature. Traditional methodologies, which primarily rely on 2D representations and matching techniques, are often inadequate in handling pose variations and occlusions. To solve this, we introduce VoxDet, a pioneer 3D geometry-aware framework that fully utilizes the strong 3D voxel [...]