Seminar
Point Cloud Registration with or without Learning
Abstract: I will be presenting two of our recent works on 3D point cloud registration: A scene flow method for non-rigid registration: I will discuss our current method to recover scene flow from point clouds. Scene flow is the three-dimensional (3D) motion field of a scene, and it provides information about the spatial arrangement [...]
Dynamical Robots via Origami-Inspired Design
Abstract: Origami-inspired engineering produces structures with high strength-to-weight ratios and simultaneously lower manufacturing complexity. This reliable, customizable, cheap fabrication and component assembly technology is ideal for robotics applications in remote, rapid deployment scenarios that require platforms to be quickly produced, reconfigured, and deployed. Unfortunately, most examples of folded robots are appropriate only for small-scale, low-load [...]
Propelling Robot Manipulation of Unknown Objects using Learned Object Centric Models
Abstract: There is a growing interest in using data-driven methods to scale up manipulation capabilities of robots for handling a large variety of objects. Many of these methods are oblivious to the notion of objects and they learn monolithic policies from the whole scene in image space. As a result, they don’t generalize well to [...]
When and Why Does Contrastive Learning Work?
Abstract: Contrastive learning organizes data by pulling together related items and pushing apart everything else. These methods have become very popular but it's still not entirely clear when and why they work. I will share two ideas from our recent work. First, I will argue that contrastive learning is really about learning to forget. Different [...]
Anticipating the Future: forecasting the dynamics in multiple levels of abstraction
Abstract: A key navigational capability for autonomous agents is to predict the future locations, actions, and behaviors of other agents in the environment. This is particularly crucial for safety in the realm of autonomous vehicles and robots. However, many current approaches to navigation and control assume perfect perception and knowledge of the environment, even though [...]
Learning to Perceive Videos for Embodiment
Abstract: Video understanding has achieved tremendous success in computer vision tasks, such as action recognition, visual tracking, and visual representation learning. Recently, this success has gradually been converted into facilitating robots and embodied agents to interact with the environments. In this talk, I am going to introduce our recent efforts on extracting self-supervisory signals and [...]
Open Challenges in Sign Language Translation & Production
Abstract: Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses [...]
The Search for Ancient Life on Mars Began with a Safe Landing
Abstract: Prior mars rover missions have all landed in flat and smooth regions, but for the Mars 2020 mission, which is seeking signs of ancient life, this was no longer acceptable. To maximize the variety of rock samples that will eventually be returned to earth for analysis, the Perseverance rover needed to land in a [...]
3D Recognition with self-supervised learning and generic architectures
Abstract: Supervised learning relies on manual labeling which scales poorly with the number of tasks and data. Manual labeling is especially cumbersome for 3D recognition tasks such as detection and segmentation and thus most 3D datasets are surprisingly small compared to image or video datasets. 3D recognition methods are also fragmented based on the type [...]
Rapid Adaptation for Robot Learning
Abstract: How can we train a robot to generalize to diverse environments? This question underscores the holy grail of robot learning research because it is difficult to supervise an agent for all possible situations it can encounter in the future. We posit that the only way to guarantee such a generalization is to continually learn and [...]
Robotic Cave Exploration for Search, Science, and Survey
Abstract: Robotic cave exploration has the potential to create significant societal impact through facilitating search and rescue, in the fight against antibiotic resistance (science), and via mapping (survey). But many state-of-the-art approaches for active perception and autonomy in subterranean environments rely on disparate perceptual pipelines (e.g., pose estimation, occupancy modeling, hazard detection) that process the same underlying sensor data in [...]
Humans, hands, and horses: 3D reconstruction of articulated object categories using strong, weak, and self-supervision
Abstract: Reconstructing 3D objects from a single 2D image is a task that humans perform effortlessly, yet computer vision so far has only robustly solved 3D face reconstruction. In this talk we will see how we can extend the scope of monocular 3D reconstruction to more challenging, articulated categories such as human bodies, hands and [...]
Enabling Grounded Language Communication for Human-Robot Teaming
Abstract: The ability for robots to effectively understand natural language instructions and convey information about their observations and interactions with the physical world is highly dependent on the sophistication and fidelity of the robot’s representations of language, environment, and actions. As we progress towards more intelligent systems that perform a wider range of tasks in a [...]
Looking behind the Seen in Order to Anticipate
Abstract: Despite significant recent progress in computer vision and machine learning, personalized autonomous agents often still don’t participate robustly and safely across tasks in our environment. We think this is largely because they lack an ability to anticipate, which in turn is due to a missing understanding about what is happening behind the seen, i.e., [...]
Robots that Learn through Language
Abstract: Advances in perception have been integral to transitioning robots from machines restricted to factory automation to autonomous agents that operate robustly in unstructured environments. As our surrogates, robots enable people to explore the deepest depths of the ocean and distant regions of space, making discoveries that would otherwise be impossible. The age of robots [...]
Towards Reconstructing Any Object in 3D
Abstract: The world we live in is incredibly diverse, comprising of over 10k natural and man-made object categories. While the computer vision community has made impressive progress in classifying images from such diverse categories, the state-of-the-art 3D prediction systems are still limited to merely tens of object classes. A key reason for this stark difference [...]
The Clinician’s AI Partner: Augmenting Clinician Capabilities Across the Spectrum of Healthcare
Abstract: Clinicians often work under highly demanding conditions to deliver complex care to patients. As our aging population grows and care becomes increasingly complex, physicians and nurses are now also experiencing feelings of burnout at unprecedented levels. In this talk, I will discuss possibilities for computer vision to function as a partner to clinicians, and to augment their capabilities, across [...]
The Unusual Effectiveness of Abstractions for Assistive AI
Abstract: Can we balance efficiency and reliability while designing assistive AI systems? What would such AI systems need to provide? In this talk I will present some of our recent work addressing these questions. In particular, I will show that a few fundamental principles of abstraction are surprisingly effective in designing efficient and reliable AI [...]
Reliable and Accessible Visual Recognition
Abstract: As visual recognition models are developed across diverse applications; we need the ability to reliably deploy our systems in a variety of environments. At the same time, visual models tend to be trained and evaluated on a static set of curated and annotated data which only represents a subset of the world. In this [...]
Fake It Till You Make It: Face analysis in the wild using synthetic data alone
Abstract: In this seminar I will demonstrate how synthetic data alone can be used to perform face-related computer vision in the wild. The community has long enjoyed the benefits of synthesizing training data with graphics, but the domain gap between real and synthetic data has remained a problem, especially for human faces. Researchers have tried [...]
Robotics and Warehouse Automation at Berkshire Grey
Abstract: This talk tells the Berkshire Grey story, from its founding in 2013 to its IPO earlier this year — the first robotics IPO since iRobot over15 years ago. Berkshire Grey produces automated systems for e-commerce order fulfillment, parcel sortation, store replenishment, and related operations in warehouses, distribution centers, and in the back ends of [...]
Leveraging StyleGAN for Image Editing and Manipulation
Abstract: StyleGAN has recently been established as the state-of-the-art unconditional generator, synthesizing images of phenomenal realism and fidelity, particularly for human faces. With its rich semantic space, many works have attempted to understand and control StyleGAN’s latent representations with the goal of performing image manipulations. To perform manipulations on real images, however, one must learn to [...]
Resilient Exploration in SubT Environments: Team Explorer’s Approach and Lessons Learned in the Final Event
Abstract: Subterranean robot exploration is difficult with many mobility, communications, and navigation challenges that require an approach with a diverse set of systems, and reliable autonomy. While prior work has demonstrated partial successes in addressing the problem, here we convey a comprehensive approach to address the problem of subterranean exploration in a wide range of [...]
Next-Gen Video Communication
Abstract: Video communication connects our world. It is necessary in conducting business, educational and personal activities across different geographical locations. However, the quality of an average user’s video communication is dramatically worse than that of professionally created videos in news broadcasts, talk shows, and on YouTube. This is because professionally created videos are often captured with [...]
Activity Understanding of Scripted Performances
Abstract: The PSU Taichi for Smart Health project has been doing a deep-dive into vision-based analysis of 24-form Yang-style Taichi (TaijiQuan). A key property of Taichi, shared by martial arts katas and prearranged form exercises in other sports, is practice of a scripted routine to build both mental and physical competence. The scripted nature of routines [...]
Domain adaptive object detection
Abstract: Recent advances in deep learning have led to the development of accurate and efficient models for object detection. However, learning highly accurate models relies on the availability of large-scale annotated datasets. Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images. Domain adaptation tries to mitigate this degradation. In [...]
Visual Understanding across Semantic Groups, Domains and Devices
Abstract: Deep neural networks often lack generalization capabilities to accommodate changes in the input/output domain distributions and, therefore, are inherently limited by the restricted visual and semantic information contained in the original training set. In this talk, we argue the importance of the versatility of deep neural architectures and we explore it from various perspectives. [...]
Towards Robust Human-Robot Interaction: A Quality Diversity Approach
Abstract: The growth of scale and complexity of interactions between humans and robots highlights the need for new computational methods to automatically evaluate novel algorithms and applications. Exploring the diverse scenarios of interaction between humans and robots in simulation can improve understanding of complex human-robot interaction systems and avoid potentially costly failures in real-world settings. [...]
Topology-Driven Learning for Biomedical Imaging Informatics
Abstract: Thanks to decades of technology development, we are now able to visualize in high quality complex biomedical structures such as neurons, vessels, trabeculae and breast tissues. We need innovative methods to fully exploit these structures, which encode important information about underlying biological mechanisms. In this talk, we explain how topology, i.e., connected components, handles, loops, [...]
Lessons from the Field: Deep Learning and Machine Perception for field robots
Abstract: Mobile robots now deliver vast amounts of sensor data from large unstructured environments. In attempting to process and interpret this data there are many unique challenges in bridging the gap between prerecorded data sets and the field. This talk will present recent work addressing the application of machine learning techniques to mobile robotic perception. [...]
Learning generative representations for image distributions
Abstract: Autoencoder neural networks are an unsupervised technique for learning representations, which have been used effectively in many data domains. While capable of generating data, autoencoders have been inferior to other models like Generative Adversarial Networks (GAN’s) in their ability to generate image data. We will describe a general autoencoder architecture that addresses this limitation, and [...]
Building Intelligent and Visceral Machines: From Sensing to Application
Abstract: Humans have evolved to have highly adaptive behaviors that help us survive and thrive. As AI prompts a move from computing interfaces that are explicit and procedural to those that are implicit and intelligent, we are presented with extraordinary opportunities. In this talk, I will argue that understanding affective and behavioral signals presents many opportunities [...]
GANcraft – an unsupervised 3D neural method for world-to-world translation
Abstract: Advances in 2D image-to-image translation methods, such as SPADE/GauGAN, have enabled users to paint photorealistic images by drawing simple sketches similar to those created in Microsoft Paint. Despite these innovations, creating a realistic 3D scene remains a painstaking task, out of the reach of most people. It requires years of expertise, professional software, a library [...]
Learning Optical Flow: Model, Data, and Applications
Abstract: Optical flow provides important information about the dynamic world and is of fundamental importance to many tasks. In this talk, I will present my work on different aspects of learning optical flow. I will start with the background and talk about PWC-Net, a compact and effective model built using classical principles for optical flow. Next, [...]
Distributed Dissipativity: Applying Foundational Stability Theory to Modern Networked Control
Abstract: Despite its diverse areas of application, the desire to optimize performance and guarantee acceptable behaviour in the face of inevitable uncertainty is pervasive throughout control theory. This creates a fundamental challenge since the necessity of robustly stable control schemes often favors conservative designs, while the desire to optimize performance typically demands the opposite. While [...]
Haptic Perspective-taking from Vision and Force
Abstract: Physically collaborative robots present an opportunity to positively impact society across many domains. However, robots currently lack the ability to infer how their actions physically affect people. This is especially true for robotic caregiving tasks that involve manipulating deformable cloth around the human body, such as dressing and bathing assistance. In this talk, I [...]
Do Vision-Language Pretrained Models Learn Spatiotemporal Primitive Concepts?
Abstract: Vision-language models pretrained on web-scale data have revolutionized deep learning in the last few years. They have demonstrated strong transfer learning performance on a wide range of tasks, even under the "zero-shot" setup, where text "prompts" serve as a natural interface for humans to specify a task, as opposed to collecting labeled data. These models are [...]
Perception-Action Synergy in Uncertain Environments
Abstract: Many robotic applications require a robot to operate in an environment with unknowns or uncertainty, at least initially, before it gathers enough information about the environment. In such a case, a robot must rely on sensing and perception to feel its way around. Moreover, it has to couple sensing/perception and motion synergistically in real [...]
Max-Affine Spline Insights into Deep Learning
Abstract: We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs) that provide a powerful portal through which we view and analyze their inner workings. For instance, [...]
Teruko Yata Memorial Lecture
Leveraging Language and Video Demonstrations for Learning Robot Manipulation Skills and Enabling Closed-Loop Task Planning Humans have gradually developed language, mastered complex motor skills, created and utilized sophisticated tools. The act of conceptualization is fundamental to these abilities because it allows humans to mentally represent, summarize and abstract diverse knowledge and skills. By means of [...]
Designing Robotic Systems with Collective Embodied Intelligence
Abstract: Natural swarms exhibit sophisticated colony-level behaviors with remarkable scalability and error tolerance. Their evolutionary success stems from more than just intelligent individuals, it hinges on their morphology, their physical interactions, and the way they shape and leverage their environment. Mound-building termites, for instance, are believed to use their own body as a template for [...]