Computational Heat and Light Transport for Scene Understanding
Abstract: Thermal cameras don’t just capture heat maps—they see a mix of emitted and reflected infrared radiation. In this talk, I’ll show how we can computationally disentangle these signals to enable better interpretation of scenes from thermal data. I’ll begin with a dual-band imaging system that leverages differences in spectral emissivity to separate emitted radiation [...]
Unified Vision-Language Modeling
Abstract: Recent advances in large-scale language modeling have demonstrated significant success across various tasks, prompting efforts to extend these capabilities to other modalities, including 2D and 3D vision. However, this effort has been met with a variety of challenges due to fundamental differences in data representations, task-specific requirements, and the relative scarcity of large, high-quality [...]
SmokeSeer: 3D Gaussian Splatting for Smoke Removal and Scene Reconstruction
Abstract: In safety-critical environments such as firefighting, search and rescue, and industrial inspection, the presence of dense smoke severely hampers visual perception and degrades the performance of vision-based systems. Traditional dehazing and reconstruction methods are limited by their reliance on data-driven priors or assumptions of static, low-density smoke. We present SmokeSeer, a method that performs [...]
Advancing 3D Semantic and Geometric Reasoning
Abstract: Recent advances in foundation models have dramatically improved reasoning over language, vision, and decision-making for autonomous systems. However, extending this intelligence to embodied agents requires bridging the gap between abstract 2D understanding and grounded 3D interaction—a challenge driven by limited 3D data and the inherent complexity of spatial reasoning. This work addresses the problem [...]
Towards Scalable Layout Optimization for Large-Scale Multi-Robot Coordination Systems
Abstract: With the rapid progress in Multi-Agent Path Finding (MAPF), researchers have studied how MAPF algorithms can be deployed to coordinate hundreds of robots in large automated warehouses. While most works try to improve the throughput of such warehouses by developing better MAPF algorithms, we focus on improving the throughput by optimizing the warehouse layout. [...]
Learning Universal Humanoid Control
Abstract: Since infancy, humans acquire motor skills, behavioral priors, and objectives by learning from their caregivers. Similarly, as we create humanoids in our own image, we aspire for them to learn from us and develop universal physical and cognitive capabilities that are comparable to, or even surpass, our own. In this thesis, we explore how [...]
Enhancing the Physical Capabilities of Aerial Robots: From Inspection to Manipulation
Abstract: Uncrewed Aerial Vehicles (UAVs) are increasingly used for high-altitude tasks, many of which require not only perception but also active interaction with the environment. This has led to growing interest in aerial manipulation—combining aerial mobility with manipulation capabilities. In this talk, we explore how to move toward general aerial manipulation: enabling a single system [...]
Flexible Perception for High-Performance Robot Navigation
Abstract: Real-world autonomy requires perception systems that deliver rich, accurate information given the task and environment. However, as robots scale to diverse and rapidly evolving settings, maintaining this level of performance becomes increasingly brittle and labor-intensive, requiring significant human engineering and retraining for even small changes in environment and problem definition. To overcome this bottleneck, [...]
Generating a Physical World
Abstract: Generating an interactive, enlivened, and physical world enables a wide range of applications in entertainment, embodied AI, education, and creative designs. Recent image/video models have shown promise in producing realistic visuals, yet they operate purely at the pixel level and lack underlying physical grounding, leading to failures in physical fidelity and user interactivity. In [...]
Learning Bayesian Experimental Design Policies Efficiently and Robustly
Abstract: Bayesian Experimental Design (BED) provides a principled framework for sequential data-collection under uncertainty, and is used in a wide set of domains such as clinical trials, ecological monitoring, and hyperparameter optimization. Despite its wide applicability, BED methods remain challenging to deploy in practice due to their significant computational demands. This thesis addresses these computational [...]
Unlocking Robust Spatial Perception: Resilient State Estimation and Mapping for Long-term Autonomy
Abstract: How can we enable robots to perceive, adapt, and understand their surroundings like humans—in real-time and under uncertainty? Just as humans rely on vision to navigate complex environments, robots need robust and intelligent perception systems—“eyes” that can endure sensor degradation, adapt to changing conditions, and recover from failure. However, today’s visual systems are fragile—easily [...]
When Spatial Computing meets Accelerated Computing
Abstract: NVIDIA has been pioneering Accelerated Computing for the past three decades, driving innovations that have transformed society. Among all personal computing mediums, Spatial Computing and Extended Reality (XR) stand out as some of the most promising beneficiaries of accelerated computing. In this talk, we will explore the latest developments and trends in the XR ecosystem, [...]
From Pixels to Physical Intelligence: Semantic 3D Data Generation at Internet Scale
Abstract: Modern AI won’t achieve physical intelligence until it can extract rich, semantic spatial knowledge from the wild ocean of internet video—not just curated motion-capture datasets or expensive 3D scans. This thesis proposes a self-bootstrapping pipeline for converting raw pixels into large-scale 3D and 4D spatial understanding. It begins with multi-view bootstrapping: using just two [...]
Self supervised perception for Tactile Dexterity
Abstract: Humans are incredibly dexterous. We interact with and manipulate tools effortlessly, leveraging touch without giving it a second thought. Yet, replicating this level of dexterity in robots, is a major challenge. While the robotics community, recognizing the importance of touch in fine manipulation, has developed a wide variety of tactile sensors, how best to [...]
Differentiable Probabilistic Inference and Rendering for Multimodal Robotic Perception
Abstract: Robots are increasingly deployed to automate tasks that are dangerous or mundane for humans such as search and rescue, mapping, and inspection in difficult environments. They rely on their perception stack, typically composed of complementary sensing modalities, to estimate their own state and the state of the environment to enable informed decision-making. This thesis [...]
Video intelligence in the era of multimodal
Abstract: The past few years have witnessed great success in video intelligence, as supercharged by multimodal models. In this talk, I will start with a brief sharing of our efforts, in building video-language models for understanding and diffusion models for video generation. Yet, video understanding and generation have always been two separate research pillars, despite [...]
Robotics Institute Picnic
Please mark your calendars and plan to join us for the 2025 Robotics Institute Picnic! More information and RSVP e-vite to follow as we get closer to the event.