Seminar
Snakes & Spiders, Robots & Geometry
Abstract: Locomotion and perception are a common thread between robotics and biology. Understanding these phenomena at a mechanical level involves nonlinear dynamics and the coordination of many degrees of freedom. In this talk, I will discuss geometric approaches to organizing this information in two problem domains: Undulatory locomotion of snakes and swimmers, and vibration propagation [...]
Multimodal Modeling: Learning Beyond Visual Knowledge
Abstract: The computer vision community has embraced the success of learning specialist models by training with a fixed set of predetermined object categories, such as ImageNet or COCO. However, learning only from visual knowledge might hinder the flexibility and generality of visual models, which requires additional labeled data to specify any other visual concept and [...]
Robotic Cave Exploration for Search, Science, and Survey
Abstract: Robotic cave exploration has the potential to create significant societal impact through facilitating search and rescue, in the fight against antibiotic resistance (science), and via mapping (survey). But many state-of-the-art approaches for active perception and autonomy in subterranean environments rely on disparate perceptual pipelines (e.g., pose estimation, occupancy modeling, hazard detection) that process the same underlying sensor data in different [...]
Audio-Visual Learning for Social Telepresence
Abstract Relationships between people are strongly influenced by distance. Even with today’s technology, remote communication is limited to a two-dimensional audio-visual experience and lacks the availability of a shared, three-dimensional space in which people can interact with each other over the distance. Our mission at Reality Labs Research (RLR) in Pittsburgh is to develop such [...]
Representations in Robot Manipulation: Learning to Manipulate Ropes, Fabrics, Bags, and Liquids
Abstract: The robotics community has seen significant progress in applying machine learning for robot manipulation. However, much manipulation research focuses on rigid objects instead of highly deformable objects such as ropes, fabrics, bags, and liquids, which pose challenges due to their complex configuration spaces, dynamics, and self-occlusions. To achieve greater progress in robot manipulation of [...]
Safe and Stable Learning for Agile Robots without Reinforcement Learning
Abstract: My research group (https://aerospacerobotics.caltech.edu/) is working to systematically leverage AI and Machine Learning techniques towards achieving safe and stable autonomy of safety-critical robotic systems, such as robot swarms and autonomous flying cars. Another example is LEONARDO, the world's first bipedal robot that can walk, fly, slackline, and skateboard. Stability and safety are often research problems [...]
Towards editable indoor lighting estimation
Abstract: Combining virtual and real visual elements into a single, realistic image requires the accurate estimation of the lighting conditions of the real scene. In recent years, several approaches of increasing complexity---ranging from simple encoder-decoder architecture to more sophisticated volumetric neural rendering---have been proposed. While the quality of automatic estimates has increased, they have the unfortunate downside [...]
Computational imaging with multiply scattered photons
Abstract: Computational imaging has advanced to a point where the next significant milestone is to image in the presence of multiply-scattered light. Though traditionally treated as noise, multiply-scattered light carries information that can enable previously impossible imaging capabilities, such as imaging around corners and deep inside tissue. The combinatorial complexity of multiply-scattered light transport makes [...]
Towards $1 robots
Abstract: Robots are pretty great -- they can make some hard tasks easy, some dangerous tasks safe, or some unthinkable tasks possible. And they're just plain fun to boot. But how many robots have you interacted with recently? And where do you think that puts you compared to the rest of the world's people? In [...]
Mental models for 3D modeling and generation
Abstract: Humans have extraordinary capabilities of comprehending and reasoning about our 3D visual world. One particular reason is that when looking at an object or a scene, not only can we see the visible surface, but we can also hallucinate the invisible parts - the amodal structure, appearance, affordance, etc. We have accumulated thousands of [...]
What (else) can you do with a robotics degree?
Abstract: In 2004, half-way through my robotics Ph.D., I had a panic-inducing thought: What if I don’t want to build robots for the rest of my life? What can I do with this degree?! Nearly twenty years later, I have some answers: tackle climate change in Latin America, educate Congress about autonomous vehicles, improve how [...]
Complete Codec Telepresence
Abstract: Imagine two people, each of them within their own home, being able to communicate and interact virtually with each other as if they are both present in the same shared physical space. Enabling such an experience, i.e., building a telepresence system that is indistinguishable from reality, is one of the goals of Reality Labs [...]
R.I.P ohyay: experiences building online virtual experiences during the pandemic: what works, what hasn’t, and what we need in the future
Abstract: During the pandemic I helped design ohyay (https://ohyay.co), a creative tool for making and hosting highly customized video-based virtual events. Since Fall 2020 I have personally designed many online events: ranging from classroom activities (lectures, small group work, poster sessions, technical papers PC meetings), to conferences, to virtual offices, to holiday parties involving 100's [...]
Physics-informed image translation
Abstract: Generative Adversarial Networks (GANs) have shown remarkable performances in image translation, being able to map source input images to target domains (e.g. from male to female, day to night, etc.). However, their performances may be limited by insufficient supervision, which may be challenging to obtain. In this talk, I will present our recent works [...]
Robots Should Reduce, Reuse, and Recycle
Abstract: Despite numerous successes in deep robotic learning over the past decade, the generalization and versatility of robots across environments and tasks has remained a major challenge. This is because much of reinforcement and imitation learning research trains agents from scratch in a single or a few environments, training special-purpose policies from special-purpose datasets. In [...]
Weak Multi-modal Supervision for Object Detection and Persuasive Media
Abstract: The diversity of visual content available on the web presents new challenges and opportunities for computer vision models. In this talk, I present our work on learning object detection models from potentially noisy multi-modal data, retrieving complementary content across modalities, transferring reasoning models across dataset boundaries, and recognizing objects in non-photorealistic media. While the [...]
Machine Learning and Model Predictive Control for Adaptive Robotic Systems
Abstract: In this talk I will discuss several different ways in which ideas from machine learning and model predictive control (MPC) can be combined to build intelligent, adaptive robotic systems. I’ll begin by showing how to learn models for MPC that perform well on a given control task. Next, I’ll introduce an online learning perspective on [...]
Towards more effective remote execution of exploration operations using multimodal interfaces
Abstract: Remote robots enable humans to explore and interact with environments while keeping them safe from existing harsh conditions (e.g., in search and rescue, deep sea or planetary exploration scenarios). However, the gap between the control station and the remote robot presents several challenges (e.g., situation awareness, cognitive load, perception, latency) for effective teleoperation. Multimodal [...]
Learning Visual, Audio, and Cross-Modal Correspondences
Abstract: Today's machine perception systems rely heavily on supervision provided by humans, such as labels and natural language. I will talk about our efforts to make systems that, instead, learn from two ubiquitous sources of unlabeled data: visual motion and cross-modal sensory associations. I will begin by discussing our work on creating unified models for [...]
Multi-Sensor Robot Navigation and Subterranean Exploration
Towards a formal theory of deep optimisation
Abstract: Precise understanding of the training of deep neural networks is largely restricted to architectures such as MLPs and cost functions such as the square cost, which is insufficient to cover many practical settings. In this talk, I will argue for the necessity of a formal theory of deep optimisation. I will describe such a [...]
Towards Interactive Radiance Fields
Abstract: Over the last years, the fields of computer vision and computer graphics have increasingly converged. Using the exact same processes to model appearance during 3D reconstruction and rendering has shown tremendous benefits, especially when combined with machine learning techniques to model otherwise hard-to-capture or -simulate optical effects. In this talk, I will give an [...]
Learning Representations for Interactive Robotics
In this talk, I will be discussing the role of learning representations for robots that interact with humans and robots that interactively learn from humans through a few different vignettes. I will first discuss how bounded rationality of humans guided us towards developing learned latent action spaces for shared autonomy. It turns out this “bounded rationality” is not a [...]
Motion Planning Around Obstacles with Graphs of Convex Sets
Abstract: In this talk, I'll describe a new approach to planning that strongly leverages both continuous and discrete/combinatorial optimization. The framework is fairly general, but I will focus on a particular application of the framework to planning continuous curves around obstacles. Traditionally, these sort of motion planning problems have either been solved by trajectory optimization [...]
RE2 Robotics: from RI spinout to Acquisition
Abstract: It was July 2001. Jorgen Pedersen founded RE2 Robotics. It was supposed to be a temporary venture while he figured out his next career move. But the journey took an unexpected course. RE2 became a leading developer of mobile manipulation systems. Fast forward to 2022, RE2 Robotics exited via an acquisition to Sarcos Technology and [...]
Enabling Self-sufficient Robot Learning
Abstract: Autonomous exploration and data-efficient learning are important ingredients for helping machine learning handle the complexity and variety of real-world interactions. In this talk, I will describe methods that provide these ingredients and serve as building blocks for enabling self-sufficient robot learning. First, I will outline a family of methods that facilitate active global exploration. [...]
Understanding the Physical World from Images
If I show you a photo of a place you have never been to, you can easily imagine what you could do in that picture. Your understanding goes from the surfaces you see to the ones you know are there but cannot see, and can even include reasoning about how interaction would change the scene. [...]
How Computer Vision Helps – from Research to Scale
Abstract: Vasudevan (Vasu) Sundarababu, SVP and Head of Digital Engineering, will cover the topic: ‘How Computer Vision Helps – from Research to Scale’. During his time, Vasu will explore how Computer Vision technology can be leveraged in-market today, the key projects he is currently leading that leverage CV, and the end-to-end lifecycle of a CV initiative - [...]
Motion Matters in the Metaverse
Abstract: Abstract: In the early 1970s, Psychologists investigated biological motion perception by attaching point-lights to the joints of the human body, known as ‘point light walkers’. These early experiments showed biological motion perception to be an extreme example of sophisticated pattern analysis in the brain, capable of easily differentiating human motions with reduced motion cues. Further [...]
What do generative models know about geometry and illumination?
Abstract: Generative models can produce compelling pictures of realistic scenes. Objects are in sensible places, surfaces have rich textures, illumination effects appear accurate, and the models are controllable. These models, such as StyleGAN, can also generate semantically meaningful edits of scenes by modifying internal parameters. But do these models manipulate a purely abstract representation of the [...]
Life as a Professor Seminar
Have you ever wondered what life is like as a professor? What do professors do on a daily basis? What makes the faculty career challenging and rewarding? Maybe you have even thought about becoming a faculty member yourself? Join us on March 22nd from 2:00 - 3:30 PM, where a panel of CMU faculty will [...]
A Constructivist’s Guide to Robot Learning
Over the last decade, a variety of paradigms have sought to teach robots complex and dexterous behaviors in real-world environments. On one end of the spectrum we have nativist approaches that bake in fundamental human knowledge through physics models, simulators and knowledge graphs. While on the other end of the spectrum we have tabula-rasa approaches [...]
Robot Learning by Understanding Egocentric Videos
Abstract: True gains of machine learning in AI sub-fields such as computer vision and natural language processing have come about from the use of large-scale diverse datasets for learning. In this talk, I will discuss if and how we can leverage large-scale diverse data in the form of egocentric videos (first-person videos of humans conducting [...]
Next-Generation Robot Perception: Hierarchical Representations, Certifiable Algorithms, and Self-Supervised Learning
Spatial perception —the robot’s ability to sense and understand the surrounding environment— is a key enabler for robot navigation, manipulation, and human-robot interaction. Recent advances in perception algorithms and systems have enabled robots to create large-scale geometric maps of unknown environments and detect objects of interest. Despite these advances, a large gap still separates robot [...]
Autonomous mobility in Mars exploration: recent achievements and future prospects
Abstract: This talk will summarize key recent advances in autonomous surface and aerial mobility for Mars exploration, then discuss potential future missions and technology needs for Mars and other planetary bodies. Among recent advances, the Perseverance rover that is now operating on Mars includes new autonomous navigation capability that dramatically increases its traverse speed over [...]
Structures and Environments for Generalist Agents
Abstract: We are entering an era of highly general AI, enabled by supervised models of the Internet. However, it remains an open question how intelligence emerged in the first place, before there was an Internet to imitate. Understanding the emergence of skillful behavior, without expert data to imitate, has been a longstanding goal of reinforcement [...]
From Videos to 4D Worlds and Beyond
Abstract: Abstract: The world underlying images and videos is 3-dimensional and dynamic, i.e. 4D, with people interacting with each other, objects, and the underlying scene. Even in videos of a static scene, there is always the camera moving about in the 4D world. Accurately recovering this information is essential for building systems that can reason [...]
Mars Robots and Robotics at NASA JPL
Abstract: In this seminar I’ll discuss Mars robots, the unprecedented results we’re seeing with the latest Mars mission, and how we got here. Perseverance’s manipulation and sampling systems have collected samples from unique locations at twice the rate of any prior mission. 88% of all driving has been autonomous. This has enabled the mission to [...]
Generative and Animatable Radiance Fields
Abstract: Generating and transforming content requires both creativity and skill. Creativity defines what is being created and why, while skill answers the question of how. While creativity is believed to be abundant, skill can often be a barrier to creativity. In our team, we aim to substantially reduce this barrier. Recent Generative AI methods have simplified the problem for 2D [...]
Generative modeling: from 3D scenes to fields and manifold
Abstract: In this keynote talk, we delve into some of our progress on generative models that are able to capture the distribution of intricate and realistic 3D scenes and fields. We explore a formulation of generative modeling that optimizes latent representations for disentangling radiance fields and camera poses, enabling both unconditional and conditional generation of 3D [...]
Estimating Robustness using Proxies
ABSTRACT: This talk covers some of our recent explorations on estimating the robustness of black-box machine learning models across data subpopulations. In other words, if a trained model is uniformly accurate across different types of inputs, or if there are significant performance disparities affecting the different subpopulations. Measuring such a characteristic is fairly straightforward if [...]
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
Abstract: In this talk, I will focus on presenting my recent work which will be presented at CVPR in less than two months. Text-guided image generation has progressed rapidly in recent years, inspiring major breakthroughs in text-guided shape generation. Recently, it has been shown that using score distillation, one can successfully text-guide a NeRF model to [...]
Navigating to Objects in the Real World
Abstract: Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end [...]
Going Beyond Continual Learning: Towards Organic Lifelong Learning
Abstract: Supervised learning, the harbinger of machine learning over the last decade, has had tremendous impact across application domains in recent years. However, the notion of a static trained machine learning model is becoming increasingly limiting, as these models are deployed in changing and evolving environments. Among a few related settings, continual learning has gained significant [...]
Predictive Scene Representations for Embodied Visual Search
Abstract: My research advances embodied AI by developing large-scale datasets and state-of-the-art algorithms. In my talk, I will specifically focus on the embodied visual search problem, which aims to enable intelligent search for robots and augmented reality (AR) assistants. Embodied visual search manifests as the visual navigation problem in robotics, where a mobile agent must efficiently navigate [...]
Special RI Seminar
Title: Testing, Analysis, and Specification for Robust and Reliable Robot Software Abstract: Building robust and reliable robotic software is an inherently challenging feat that requires substantial expertise across a variety of disciplines. Despite that, writing robot software has never been easier thanks to software frameworks such as ROS: At its best, ROS allows newcomers to assemble simple, [...]
Generating Beautiful Pixels
Abstract: In this talk, I will present three experiments that use low-level image statistics to generate high-resolution detailed outputs. In the first experiment, I will use 2D pixels to efficiently mine hard examples for better learning. Simply biasing ray sampling towards hard ray examples enables learning of neural fields with more accurate high-frequency detail in less [...]
Towards Reliable Computer Vision Systems
Abstract: The real world has infinite visual variation – across viewpoints, time, space, and curation. As deep visual models become ubiquitous in high-stakes applications, their ability to generalize across such variation becomes increasingly important. In this talk, I will present opportunities to improve such generalization at different stages of the ML lifecycle: first, I will [...]
Transforming Hollywood Visual Effects with Graphics and Vision
Abstract: Paul will describe his path to developing visual effects technology used in hundreds of movies, including The Matrix, Spider-Man 2, Benjamin Button, Avatar, Maleficent, Furious 7, and Blade Runner: 2049. These techniques include image-based modeling and rendering, high dynamic range imaging, image-based lighting, and high-resolution facial scanning for photoreal digital actors. Paul will also [...]
Vision without labels
Abstract: Deep learning has revolutionized all aspects of computer vision, but its successes have come from supervised learning at scale: large models trained on ever larger labeled datasets. However this reliance on labels makes these systems fragile when it comes to new scenarios or new tasks where labels are unavailable. This is in stark contrast to [...]
Learning Meets Gravity: Robots that Learn to Embrace Dynamics from Data
Abstract: Despite the incredible capabilities (speed and repeatability) of our hardware today, many robot manipulators are deliberately programmed to avoid dynamics – moving slow enough so they can adhere to quasi-static assumptions of the world. In contrast, people frequently (and subconsciously) make use of dynamic phenomena to manipulate everyday objects – from unfurling blankets, to [...]
Large Multimodal (Vision-Language) Models for Image Generation and Understanding
Abstract: Large Language Models and Large Vision Models, also known as Foundation Models, have led to unprecedented advances in language understanding, visual understanding, and AI. In particular, many computer vision problems including image classification, object detection, and image generation have benefited from the capabilities of such models trained on internet-scale text and visual data. In [...]
Learning and Control for Safety, Efficiency, and Resiliency of Embodied AI
Abstract: The rapid evolution of ubiquitous sensing, communication, and computation technologies has revolutionized of cyber-physical systems (CPS) across virous domains like robotics, smart grids, aerospace, and smart cities. Integrating learning into dynamic systems control presents significant Embodied AI opportunities. However, current decision-making frameworks lack comprehensive understanding of the tridirectional relationship among communication, learning and control, [...]
Imaginative Vision Language Models: Towards human-level imaginative AI skills transforming species discovery, content creation, self-driving cars, and emotional health
Abstract: Most existing AI learning methods can be categorized into supervised, semi-supervised, and unsupervised methods. These approaches rely on defining empirical risks or losses on the provided labeled and/or unlabeled data. Beyond extracting learning signals from labeled/unlabeled training data, we will reflect in this talk on a class of methods that can learn beyond the vocabulary [...]
World Knowledge in the Time of Large Models
Abstract: This talk will discuss the massive shift that has come about in the vision and ML community as a result of the large pre-trained language and language and vision models such as Flamingo, GPT-4, and other models. We begin by looking at the work on knowledge-based systems in CV and robotics before the large model [...]
Data-Efficient Learning for Robotics and Reinforcement Learning
Abstract: Data efficiency, i.e., learning from small datasets, is of practical importance in many real-world applications and decision-making systems. Data efficiency can be achieved in multiple ways, such as probabilistic modeling, where models and predictions are equipped with meaningful uncertainty estimates, transfer learning, or the incorporation of valuable prior knowledge. In this talk, I will [...]
Digital Human Modeling with Light
Abstract: Leveraging light in various ways, we can observe and model physical phenomena or states which may not be possible to observe otherwise. In this talk, I will introduce our recent exploration on digital human modeling with different types of light. First, I will present our recent work on the modeling of relightable human heads, [...]
Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis
Abstract: We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model [...]
Robots at the Johnson Space Center and Future Plans
Abstract: The seminar will review a series of robotic systems built at the Johnson Space Center over the last 20 years. These will include wearable robots (exoskeletons, powered gloves and jetpacks), manipulation systems (ISS cranes down to human scale) and lunar mobility systems (human surface mobility and robotic rovers). As all robotics presentations should, this [...]