Seminar
Predictive Scene Representations for Embodied Visual Search
Abstract: My research advances embodied AI by developing large-scale datasets and state-of-the-art algorithms. In my talk, I will specifically focus on the embodied visual search problem, which aims to enable intelligent search for robots and augmented reality (AR) assistants. Embodied visual search manifests as the visual navigation problem in robotics, where a mobile agent must efficiently navigate [...]
Special RI Seminar
Title: Testing, Analysis, and Specification for Robust and Reliable Robot Software Abstract: Building robust and reliable robotic software is an inherently challenging feat that requires substantial expertise across a variety of disciplines. Despite that, writing robot software has never been easier thanks to software frameworks such as ROS: At its best, ROS allows newcomers to assemble simple, [...]
Generating Beautiful Pixels
Abstract: In this talk, I will present three experiments that use low-level image statistics to generate high-resolution detailed outputs. In the first experiment, I will use 2D pixels to efficiently mine hard examples for better learning. Simply biasing ray sampling towards hard ray examples enables learning of neural fields with more accurate high-frequency detail in less [...]
Towards Reliable Computer Vision Systems
Abstract: The real world has infinite visual variation – across viewpoints, time, space, and curation. As deep visual models become ubiquitous in high-stakes applications, their ability to generalize across such variation becomes increasingly important. In this talk, I will present opportunities to improve such generalization at different stages of the ML lifecycle: first, I will [...]
Transforming Hollywood Visual Effects with Graphics and Vision
Abstract: Paul will describe his path to developing visual effects technology used in hundreds of movies, including The Matrix, Spider-Man 2, Benjamin Button, Avatar, Maleficent, Furious 7, and Blade Runner: 2049. These techniques include image-based modeling and rendering, high dynamic range imaging, image-based lighting, and high-resolution facial scanning for photoreal digital actors. Paul will also [...]
Vision without labels
Abstract: Deep learning has revolutionized all aspects of computer vision, but its successes have come from supervised learning at scale: large models trained on ever larger labeled datasets. However this reliance on labels makes these systems fragile when it comes to new scenarios or new tasks where labels are unavailable. This is in stark contrast to [...]
Learning Meets Gravity: Robots that Learn to Embrace Dynamics from Data
Abstract: Despite the incredible capabilities (speed and repeatability) of our hardware today, many robot manipulators are deliberately programmed to avoid dynamics – moving slow enough so they can adhere to quasi-static assumptions of the world. In contrast, people frequently (and subconsciously) make use of dynamic phenomena to manipulate everyday objects – from unfurling blankets, to [...]
Large Multimodal (Vision-Language) Models for Image Generation and Understanding
Abstract: Large Language Models and Large Vision Models, also known as Foundation Models, have led to unprecedented advances in language understanding, visual understanding, and AI. In particular, many computer vision problems including image classification, object detection, and image generation have benefited from the capabilities of such models trained on internet-scale text and visual data. In [...]
Learning and Control for Safety, Efficiency, and Resiliency of Embodied AI
Abstract: The rapid evolution of ubiquitous sensing, communication, and computation technologies has revolutionized of cyber-physical systems (CPS) across virous domains like robotics, smart grids, aerospace, and smart cities. Integrating learning into dynamic systems control presents significant Embodied AI opportunities. However, current decision-making frameworks lack comprehensive understanding of the tridirectional relationship among communication, learning and control, [...]
Imaginative Vision Language Models: Towards human-level imaginative AI skills transforming species discovery, content creation, self-driving cars, and emotional health
Abstract: Most existing AI learning methods can be categorized into supervised, semi-supervised, and unsupervised methods. These approaches rely on defining empirical risks or losses on the provided labeled and/or unlabeled data. Beyond extracting learning signals from labeled/unlabeled training data, we will reflect in this talk on a class of methods that can learn beyond the vocabulary [...]