Robot Safety Beyond Collision-Avoidance
Abstract: It is common to equate robot safety with “collision avoidance”, but in unstructured open-world environments, a robot’s representation of safety should be much more nuanced. For example, the household manipulator should understand that pouring coffee too fast will cause the liquid to overflow or pulling a mug too quickly from a cupboard will cause [...]
Sensing the Unseen: Dexterous Tool Manipulation Through Touch and Vision
Abstract: Dexterous tool manipulation is a dance between tool motion, deformation, and force transmission choreographed by the robot's end-effector. Take for example the use of a spatula. How should the robot reason jointly over the tool’s geometry and forces imparted to the environment through vision and touch? In this talk, I will present our recent [...]
Autoregressive Models: Foundations and Open Questions
Abstract: The success of Autoregressive (AR) models in language today is so tremendous that their scope has, in turn, been largely narrowed to specific instantiations. In this talk, we will revisit the foundations of classical AR models, discussing essential concepts that may have been overlooked in modern practice. We will then introduce our recent research [...]
Carnegie Mellon University
Enabling Collaboration between Creators and Generative Models
Abstract: Generative models have made visual content creation as little effort as writing a short text description. Meanwhile, these models also spark concerns among artists, designers, and photographers about job security and data ownership. This leads to many questions: Will generative models make creators’ jobs obsolete? Should creators stop sharing their work publicly? How can creators [...]
Learning Environment Models for Mobile Robot Autonomy
Abstract: Robots are expected to execute increasingly complex tasks in increasingly complex and a priori unknown environments. A key prerequisite is the ability to understand the geometry and semantics of the environment in real time from sensor observations. This talk will present techniques for learning metric-semantic environment models from RGB and depth observations. Specific examples include [...]
Teruko Yata Memorial Lecture in Robotics
Title: Learning World Simulators from Data Abstract: Modern foundational models have achieved superhuman performance in many logic and mathematical reasoning tasks by learning to think step by step. However, their ability to understand videos, and, consequently, control embodied agents, lags behind. They often make mistakes in recognizing simple activities, and often hallucinate when generating videos. This [...]
Investigating Compositional Reasoning in Time Series Foundation Models
Abstract: Large pre-trained time series foundation models (TSFMs) have demonstrated promising zero-shot performance across a wide range of domains. However, a question remains: Do TSFMs succeed solely by memorizing training patterns, or do they possess the ability to reason? While reasoning is a topic of great interest in the study of Large Language Models (LLMs), [...]
Learning from Animal and Human Videos
Abstract: Animals and humans can learn from the billions of years of life on Earth and the evoluNon that has shaped it. If robots can borrow from that wealth of experience, they too could be enabled to learn from the experience, instead of learning through brute force trial-and-error. Learning from internet-scale videos, such as the [...]
Learning Efficient 3D Generation
Abstract: Recent advances in 3D generation have enabled the synthesis of multi-view images using large-scale pre-trained 2D diffusion models. However, these methods typically require dozens of forward passes, resulting in significant computational overhead. In this talk, we introduce Turbo3D, an ultra-fast text-to-3D system that generates high-quality Gaussian Splatting assets in under one second. Turbo3D features a [...]
Reconstructing Tree Skeletons in Agricultural Robotics: A Comparative Study of Single-View and Volumetric Methods
Abstract: This thesis investigates the problem of reconstructing tree skeletons for agricultural robotics, comparing single-view image-based (Image to 3D) and volumetric (3D to 3D) methods. Accurate 3D modeling is essential for robotic tasks like pruning and harvesting, where understanding the underlying branch structure is critical. Using a custom-generated dataset of synthetic trees, we train encoder-decoder [...]