Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
Abstract: This talk will present our approach for reconstructing objects from sparse-view images captured in unconstrained environments. In the absence of ground-truth camera poses, we will demonstrate how to utilize estimates from off-the-shelf systems and address two key challenges: refining noisy camera poses in sparse views and effectively handling outlier poses. Bio: Qitao is a second-year [...]
EgoTouch: On-Body Touch Input Using AR/VR Headset Cameras
Abstract: In augmented and virtual reality (AR/VR) experiences, a user’s arms and hands can provide a convenient and tactile surface for touch input. Prior work has shown on-body input to have significant speed, accuracy, and ergonomic benefits over in-air interfaces, which are common today. In this work, we demonstrate high accuracy, bare hands (i.e., no special [...]
Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality
Abstract: Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as [...]
VoxDet: Voxel Learning for Novel Instance Detection
Abstract: Detecting unseen instances based on multi-view templates is a challenging problem due to its open-world nature. Traditional methodologies, which primarily rely on 2D representations and matching techniques, are often inadequate in handling pose variations and occlusions. To solve this, we introduce VoxDet, a pioneer 3D geometry-aware framework that fully utilizes the strong 3D voxel [...]
Voxel Learning for Novel Instance Detection
Abstract: Detecting unseen instances based on multi-view templates is a challenging problem due to its open-world nature. Traditional methodologies, which primarily rely on 2D representations and matching techniques, are often inadequate in handling pose variations and occlusions. To solve this, we introduce VoxDet, a pioneer 3D geometry-aware framework that fully utilizes the strong 3D voxel [...]
Sensorimotor-Aligned Design for Pareto-Efficient Haptic Immersion in Extended Reality
Abstract: A new category of computing devices is emerging: augmented and virtual reality headsets, collectively referred to as extended reality (XR). These devices can alter, augment, or even replace our reality. While these headsets have made impressive strides in audio-visual immersion over the past half-century, XR interactions remain almost completely absent of appropriately expressive tactile [...]
Evaluating and Improving Vision-Language Models Beyond Scaling Laws
Abstract: In this talk, we present our work on advancing Vision-Language Models (VLMs) beyond scaling laws through improved evaluation and (post-)training strategies. Our contributions include VQAScore, a state-of-the-art alignment metric for text-to-visual generation. We show how VQAScore improves visual generation under real-world user prompts in GenAI-Bench. Additionally, we explore training methods that leverage the language [...]
Whisker-Inspired Sensors for Unstructured Environments
Abstract: Robots lack the perception abilities of animals, which is one reason they can not achieve complex control in outdoor unstructured environments with the same ease as animals. One cause of the perception gap is the constraints researchers place on the environments in which they test new sensors so algorithms can correctly interpret data from [...]
Strategy and Skill Learning for Physics-based Table Tennis Animation
Abstract: Recent advancements in physics-based character animation leverage deep learning to generate agile and natural motion, enabling characters to execute movements such as backflips, boxing, and tennis. However, reproducing the selection and use of diverse motor skills in dynamic environments to solve complex tasks, as humans do, still remains a challenge. We present a strategy [...]
Abstraction Barriers for Embodied Algorithms
Abstract: Designing robotic systems to reliably modify their environment typically requires expert engineers and several design iterations. This talk will cover abstraction barriers that can be used to make the process of building such systems easier and the results more predictable. By focusing on approximate mathematical representations that model the process dynamics, these representations can [...]
Getting Optimization layers to play well with Deep Networks: Numerical methods and Architectures
Abstract: Many real-world challenges, from robotic control to resource management, can be effectively formulated as optimization problems. Recent advancements have focused on incorporating these optimization problems as layers within deep learning pipelines, enabling the explicit inclusion of auxiliary constraints or cost functions, which is crucial for applications such as enforcing physical laws, ensuring safety constraints, [...]
RI Faculty Business Meeting
Meeting for RI Faculty. Agenda was sent via a calendar invite.
Autonomous Robotic Surgery: Science Fiction or Reality?
Abstract: Robotic assisted surgery (RAS) systems incorporate highly dexterous tools, hand tremor filtering, and motion scaling to enable a minimally invasive surgical approach, reducing collateral damage and patient recovery times. However, current state-of-the-art telerobotic surgery requires a surgeon operating every motion of the robot, resulting in long procedure times and inconsistent results. The advantages of [...]
Generative Modelling for 3D Multimodal Understanding of Human Physical Interactions
Abstract: Generative modelling has been extremely successful in synthesizing text, images, and videos. Can the same machinery also help us better understand how to physically interact with the multimodal 3D world? In this talk, I will introduce some of my group's work in answering this question. I will first discuss how we can enable 2D [...]
A retrospective, 40 Years of Field Robotics
Abstract: Chuck has been building and deploying robots in the field for the past 40 years. In this retrospective he will touch on the robots, people and experiences that have been part of the journey. From the early days in the 1980s with the Three Mile Island nuclear robots and the first outdoor autonomy robots [...]
Efficient Quadruped Mobility: Harnessing a Generalist Policy for Streamlined Planning
Abstract: Navigating quadruped robots through complex, unstructured environments over long horizons remains a significant challenge in robotics. Traditional planning methods offer guarantees such as optimality and long-horizon reasoning, while learning-based methods, particularly those involving deep reinforcement learning (DRL), provide robustness and generalization. In this thesis, we present S3D-OWNS (Skilled 3D-Optimal Waypoint Navigation System), a novel [...]
Data Attribution for Text-to-Image Models
Abstract: Large text-to-image models learn from training data to synthesize "novel" images, but how the models use the training data remains a mystery. The problem of data attribution is to identify which training images are influential for generating a given output. Specifically, removing influential images and retraining the model would prevent it from reproducing that [...]
Differentiable Convex Modeling for Robotic Planning and Control
Abstract: Robotic simulation, planning, estimation, and control, have all been built on top of numerical optimization. In this same time, modern convex optimization has matured into a robust technology delivering globally optimal solutions in polynomial time. With advances in differentiable optimization and custom solvers capable of producing smooth derivatives, convex modeling has become fast, reliable, [...]
Knowledge and Data Dependence in Decision-Making
Abstract: This thesis explores diverse decision-making strategies for autonomous agents by examining knowledge-dependent and data-dependent approaches in stationary and dynamic data environments. We address five core research problems across three thematic areas: knowledge-dependent, stationary data-dependent, and evolving data-dependent decision-making. We first investigate knowledge-driven decision-making within robotic swarms, characterizing vulnerabilities in systems governed by consistent rule-following [...]
Communication Efficient and Differentially Private Optimization
Abstract: In recent years, the integration of communication efficiency and differential privacy in distributed optimization has gained significant attention, motivated by large-scale applications such as Federated Learning (FL), where both data privacy and efficient communication are critical. This thesis explores the development of novel techniques to address these challenges, with a focus on distributed mean [...]
Towards a Universal Data Engine for Robotics and Beyond
Abstract: Robotics researchers have been attempting to extend data-driven breakthroughs in fields like computer vision and language processing into robot learning. However, unlike vision or language domains where massive amounts of data is readily available on the internet, training robotic policies relies on physical and interactive data collected via interacting with the physical world -- [...]
Learning for Dynamic Robot Manipulation of Deformable and Transparent Objects
Abstract: Dynamics, softness, deformability, and difficult-to-detect objects will be critical for new domains in robotic manipulation. But there are complications--including unmodelled dynamic effects, infinite-dimensional state spaces of deformable objects, and missing features from perception. This talk explores learning methods based on multi-view sensing, acoustics, physics-based regularizations, and Koopman operators and proposes a novel multi-finger soft [...]
HaptiClay: An Interactive Haptic Interface for Gestured Concretization of Polynomial Functions
Abstract: In this work we present HaptiClay, a low-cost kinesthetic haptic interface that elevates the understanding of mathematics language by providing embodied non-verbal representations of math concepts. Our interface integrates four key components: a haptic device, a high-level simulation that communicates with a low-level controller for force and position updates, a low-level controller that executes [...]