Seminar
Deep Learning for Understanding Dynamic Visual Data
Abstract: Perceiving dynamic environments from visual inputs allows autonomous agents to understand and interact with the world and is a core topic in Artificial Intelligence. The success of deep learning motivates us to apply deep learning techniques to the perception of dynamic visual data. However, how to design and apply deep neural networks to effectively [...]
Optimizing for coordination with people
https://youtu.be/AQ-w5o2oGI8 Abstract: From autonomous cars to quadrotors to mobile manipulators, robots need to co-exist and even collaborate with humans. In this talk, we will explore how our formalism for decision making needs to change to account for this interaction, and dig our heels into the subtleties of modeling human behavior -- sometimes strategic, often irrational, [...]
Analyzing Grasp Contact via Thermal Imaging
Abstract: Grasping and manipulating objects is an important human skill. Because contact between hand and object is fundamental to grasping, measuring it can lead to important insights. However, observing contact through external sensors is challenging because of occlusion and the complexity of the human hand. I will discuss the use of thermal cameras to capture [...]
Fast Foveation for LIDARs, Projectors and Cameras
Abstract: Most cameras today capture images without considering scene content. In contrast, animal eyes have fast mechanical movements that control how the scene is imaged in detail by the fovea, where visual acuity is highest. This concentrates computational (i.e. neuronal) resources in places where they are most needed. The prevalence of foveation, and the wide [...]
Learning to See Through Occlusions and Obstructions
Virtual VASC: https://cmu.zoom.us/j/249106600 Abstract: Photography allows us to capture and share memorable moments of our lives. However, 2D images appear flat due to the lack of depth perception and may suffer from poor imaging conditions such as taking photos through reflecting or occluding elements. In this talk, I will present our recent efforts to [...]
Detectron2 in Object Detection Research
Virtual VASC: https://cmu.zoom.us/j/249106600 Abstract: Detectron2 is Facebook's library for object detection and segmentation. It has been used widely in FAIR's research and Facebook's products. This talk will introduce detectron2 with a focus on its use in object detection research, including the lessons we learned from building it, as well as the new research enabled [...]
Fairness in visual recognition
Virtual VASC Seminar: https://cmu.zoom.us/j/249106600 Abstract: Computer vision models trained on unparalleled amounts of data hold promise for making impartial, well-informed decisions in a variety of applications. However, more and more historical societal biases are making their way into these seemingly innocuous systems. Visual recognition models have exhibited bias by inappropriately correlating age, gender, sexual [...]
Bio-inspired depth sensing using computational optics
Virtual Seminar: https://cmu.zoom.us/j/249106600 Abstract: Jumping spiders rely on accurate depth perception for predation and navigation. They accomplish depth perception, despite their tiny brains, by using specialized optics. Each principal eye includes a multitiered retina that simultaneously receives multiple images with different amounts of defocus, and distance is decoded from these images with seemingly little [...]
Task-specific Vision DNN Models and Their Relation for Explaining Different Areas of the Visual Cortex
Virtual VASC Seminar: https://cmu.zoom.us/j/249106600 Abstract: Deep Neural Networks (DNNs) are state-of-the-art models for many vision tasks. We propose an approach to assess the relationship between visual tasks and their task-specific models. Our method uses Representation Similarity Analysis (RSA), which is commonly used to find a correlation between neuronal responses from brain data and models. [...]
End-to-end Generative 3D Human Shape and Pose Models and Active Human Sensing
Virtual VASC Seminar: https://cmu.zoom.us/j/249106600 Title: End-to-end Generative 3D Human Shape and Pose Models and Active Human Sensing Abstract: I will review some of our recent work in 3d human modeling, synthesis, and active vision. I will present our new, end-to-end trainable nonlinear statistical 3d human shape and pose models of different resolutions (GHUM and GHUMLite) as [...]
Telling Left from Right: Learning Spatial Correspondence Between Sight and Sound
Virtual VASC Seminar: https://cmu.zoom.us/j/92741882813?pwd=R1R0eGRaeXFHTEF2VWNwY2VIZmU5Zz09 Abstract: Self-supervised audio-visual learning aims to capture useful representations of video by leveraging correspondences between visual and audio inputs. Existing approaches have focused primarily on matching semantic information between the sensory streams. In my talk, I’ll describe a novel self-supervised task to leverage an orthogonal principle: matching spatial information in the [...]
The Topology of Learning
Zoom Virtual Meeting: https://cmu.zoom.us/j/92178295543?pwd=L2dwZU5SbDY5NzZZNzZ4ZmFUclRqQT09 Abstract: Deep Neural Networks (DNNs) have revolutionized computer vision. We now have DNNs that achieve top results in many computer vision problems, including object recognition, facial expression analysis, and semantic segmentation, to name but a few. Unfortunately, the rise in performance has come with a cost. DNNs have become so [...]
Implicit Neural Scene Representations
Virtual Zoom Seminar: https://cmu.zoom.us/j/92178295543?pwd=L2dwZU5SbDY5NzZZNzZ4ZmFUclRqQT09 Abstract How we represent signals has major implications for the algorithms we build to analyze them. Today, most signals are represented discretely: Images as grids of pixels, shapes as point clouds, audio as grids of amplitudes, etc. If images weren't pixel grids - would we be using convolutional neural networks [...]
Computational Imaging: Beyond the Limits Imposed by Lenses
Virtual VASC Seminar: https://cmu.zoom.us/j/92587238250?pwd=S0paYUVBUXozQkFTclMwRUg0MzBNZz09 Abstract: The lens has long been a central element of cameras, since its early use in the mid-nineteenth century by Niepce, Talbot, and Daguerre. The role of the lens, from the Daguerrotype to modern digital cameras, is to refract light to achieve a one-to-one mapping between a point in the scene and a point on the sensor. This effect enables the sensor to compute a particular two-dimensional (2D) [...]
Beyond ROS: Using a Data Connectivity Framework to build and run Autonomous Systems
Virtual FRC Seminar: Seminar recording: https://cmu.zoom.us/rec/share/x84qF7_q8TlIcpHoyG_DRa58O6i8aaa8hCAW_fEPxEkBGjBVPyzW_lK0YW30RfJ3?startTime=1598551489000 Passcode: qu6)ePH9 Abstract: Next-generation robotics will need more than the current ROS code in order to comply with the interoperability, security and scalability requirements for commercial deployments. This session will provide a technical overview of ROS, ROS2 and the Data Distribution Service™ (DDS) protocol for data connectivity in safety-critical cyber-physical [...]
Learning 3D Reconstruction in Function Space
Virtual VASC Seminar: https://cmu.zoom.us/j/96635002737?pwd=RkxGVlJaUTlhcDdGeVBPcnpTS015dz09 Abstract: In this talk, I will show several recent results of my group on learning neural implicit 3D representations, departing from the traditional paradigm of representing 3D shapes explicitly using voxels, point clouds or meshes. Implicit representations have a small memory footprint and allow for modeling arbitrary 3D toplogies at [...]
Scaling Probabilistically Safe Learning to Robotics
Abstract: Before learning robots can be deployed in the real world, it is critical that probabilistic guarantees can be made about the safety and performance of such systems. In recent years, safe reinforcement learning algorithms have enjoyed success in application areas with high-quality models and plentiful data, but robotics remains a challenging domain for [...]
Compositional Representations for Visual Recognition
Virtual VASC - https://cmu.zoom.us/j/99437689110?pwd=cWxuQkIwWlFFZEk0QkVDUVFiN0lTdz09 Abstract: Compositionality is the ability for a model to recognize a concept based on its parts or constituents. This ability is essential to use language effectively as there exists a very large combination of plausible objects, attributes, and actions in the world. We posit that visual recognition models should be [...]
From kinematic to energetic design and control of wearable robots for agile human locomotion
Abstract: Even with the help of modern prosthetic and orthotic (P&O) devices, lower-limb amputees and stroke survivors often struggle to walk in the home and community. Emerging powered P&O devices could actively assist patients to enable greater mobility, but these devices are currently designed to produce a small set of pre-defined motions. Finite state machines [...]
Making 3D Predictions with 2D Supervision
Abstract: Building computer vision systems that understand 3D shape are important for applications including autonomous vehicles, graphics, and VR / AR. If we assume 3D shape supervision, we can now build systems that do a reasonable job at predicting 3D shapes from images. However, 3D supervision is difficult to obtain at scale; therefore we should [...]
The World’s Tiniest Space Program
Abstract: The aerospace industry has experienced a dramatic shift over the last decade: Flying a spacecraft has gone from something only national governments and large defense contractors could afford to something a small startup can accomplish on a shoestring budget. A virtuous cycle has developed where lower costs have led to more launches and the [...]
Perceiving 3D Human-Object Spatial Arrangements from a Single Image In-the-wild
Abstract: We live in a 3D world that is dynamic—it is full of life, with inhabitants like people and animals who interact with their environment through moving their bodies. Capturing this complex world in 3D from images has a huge potential for many applications such as compelling mixed reality applications that can interact with people [...]
A future with affordable Self-driving vehicles
(Video to appear once approved) Abstract: We are on the verge of a new era in which robotics and artificial intelligence will play an important role in our daily lives. Self-driving vehicles have the potential to redefine transportation as we understand it today. Our roads will become safer and less congested, while parking spots will be repurposed as leisure [...]
Detection of Photo Manipulation with Media Forensics
Abstract: Rapid progress in machine learning, computer vision and graphics leads to successive democratization of media manipulation capabilities. While convincing photo and video manipulation used to require substantial time and skill, modern editors bring (semi-) automated tools that can be used by everyone. Some of the most recent examples include manipulation of human faces, e.g., [...]
Robotics and Biosystems
Abstract: Research at the Center for Robotics and Biosystems at Northwestern University encompasses bio-inspiration, neuromechanics, human-machine systems, and swarm robotics, among other topics. In this talk I will give an overview of some of our recent work on in-hand manipulation, robot locomotion on yielding ground, and human-robot systems. Biography: Kevin Lynch received the B.S.E. degree [...]
Advancing the State of the Art of Computer Vision for Billions of Users
Abstract: At Google, advancing the state of the art of computer vision is very impactful as there are billions of users of Google products, many of which require high-quality, artifact-free images. I will share what we learned from successfully launching core computer vision techniques for various Google products, including PhotoScan (Photos), seamless Google Street View [...]
Learning-based 6D Object Pose Estimation in Real-world Conditions
Abstract: Estimating the 6D pose, i.e., 3D rotation and 3D translation, of objects relative to the camera from a single input image has attracted great interest in the computer vision community. Recent works typically address this task by training a deep network to predict the 6D pose given an image as input. While effective on [...]
SubT Fall Update Webinar Led by CMU’s Robotics Institute faculty members Sebastian Scherer and Matt Travers, as well as OSU’s Geoff Hollinger
We invite you to meet members of the award-winning Team Explorer, the CMU DARPA Subterranean Challenge team, and learn more about this groundbreaking competition. Some of the world's top universities have entered the DARPA Subterranean Challenge, developing technologies to map, navigate, and search underground environments. Led by CMU's Robotics Institute faculty members Sebastian Scherer and Matt [...]
Deep Learning: (still) Not Robust
Abstract: One of the key limitations of deep learning is its inability to generalize to new domains. This talk studies recent attempts at increasing neural network robustness to both natural and adversarial distribution shifts. Robustness to adversarial examples, inputs crafted specifically to fool machine learning models, are arguably the most difficult type of domain shift. [...]
Drones in Public: distancing and communication with all users
Abstract: This talk will focus on the role of human-robot interaction with drones in public spaces and be focused on two individual research areas: proximal interactions in shared spaces and improved communication with both end-users and bystanders. Prior work on human-interaction with aerial robots has focused on communication from the users or about the intended direction [...]
End-to-End ‘One Networks’: Learning Regularizers for Least Squares via Deep Neural Networks
Abstract: Linear Restoration Problems (or Linear Inverse Problems) involve reconstructing images or videos from noisy measurement vectors. Notable examples include denoising, inpainting, super-resolution, compressive sensing, deblurring and frame prediction. Often, multiple such tasks should be solved simultaneously, e.g., through Regularized Least Squares, where each individual problem is underdetermined (overcomplete) with infinitely many solutions from which [...]
Data Scalability for Robot Learning
Abstract: Recent progress in robot learning has demonstrated how robots can acquire complex manipulation skills from perceptual inputs through trial and error, particularly with the use of deep neural networks. Despite these successes, the generalization and versatility of robots across environment conditions, tasks, and objects remains a major challenge. And, unfortunately, our existing algorithms and [...]
Carnegie Mellon University
Learning to Generalize beyond Training
Abstract: Generalization, i.e., the ability to adapt to novel scenarios, is the hallmark of human intelligence. While we have systems that excel at cleaning floors, playing complex games, and occasionally beating humans, they are incredibly specific in that they only perform the tasks they are trained for and are miserable at generalization. One of the [...]
Detecting Image Synthesis — Shallow and Deep
Abstract: The proliferation of synthetic media are subject to malicious usages such as disinformation campaigns, posing potential threats to media integrity and democracy. A way to combat this is developing forensics algorithms to identify manipulated media. In the beginning of the talk, I will discuss how one can train a model to detect photos manipulated [...]
Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening
Abstract: Breast cancer screening using the standard mammography exam currently exhibits a high false recall rate (11.6% for women in the U.S.). Only a low proportion (0.5%) of women who were recalled for additional workup were actually found to have breast cancer. As a result of the unnecessary stress and follow-up work from these false [...]
The Plenoptic Camera
Abstract: Imagine a futuristic version of Google Street View that could dial up any possible place in the world, at any possible time. Effectively, such a service would be a recording of the plenoptic function—the hypothetical function described by Adelson and Bergen that captures all light rays passing through space at all times. While the plenoptic function [...]
Photorealistic Reconstruction of Landmarks and People using Implicit Scene Representation
Abstract: Reconstructing scenes to synthesize novel views is a long standing problem in Computer Vision and Graphics. Recently, implicit scene representations have shown novel view synthesis results of unprecedented quality, like the ones of Neural Radiance Fields (NeRF), which use the weights of a multi-layer perceptron to model the volumetric density and color of a [...]
Towards Discriminative and Domain-Invariant Feature Learning
Abstract: Deep neural networks have achieved great success in various visual applications, when trained with large amounts of labeled in-domain data. However, the networks usually suffer from a heavy performance drop on the data whose distribution is quite different from the training one. Domain adaptation methods aim to deal with such performance gap caused by [...]
Learning Efficient Visual Representation on Model, Data, Label and Beyond
Abstract: Efficient deep learning is a broad concept that we aim to learn compressed deep models and develop training algorithms to improve the efficiency of model representations, data and label utilization, etc. In recent years, deep neural networks have been recognized as one of the most effective techniques for many learning tasks, also, in the [...]
Self-supervised Learning and Generalization
Abstract: Contrastive self-supervised learning is a highly effective way of learning representations that are useful for, i.e. generalise, to a wide range of downstream vision tasks and datasets. In the first part of the talk, I will present MoCHi, our recently published contrastive self-supervised learning approach (NeurIPS 2020) that is able to learn transferable representations [...]
Enabling Robots to Cooperate & Compete: Distributed Optimization & Game Theoretic Methods for Multiple Interacting Robots
Abstract: For robots to effectively operate in our world, they must master the skills of dynamic interaction. Autonomous cars must safely negotiate their trajectories with other vehicles and pedestrians as they drive to their destinations. UAVs must avoid collisions with other aircraft, as well as dynamic obstacles on the ground. Disaster response robots must coordinate [...]
Learning to see from few labels
Abstract: Computer vision systems today exhibit a rich and accurate understanding of the visual world, but increasingly rely on learning on large labeled datasets to do so. This reliance on large labeled datasets is a problem especially when one considers difficult perception tasks, or novel domains where annotations might require effort or expertise. We thus [...]
The Role of Manipulation Primitives in Building Dexterous Robotic Systems
Abstract: I will start this talk by illustrating four different perspectives that we as a community have embraced to study robotic manipulation: 1) controlling a simplified model of the mechanics of interaction with an object; 2) using haptic feedback such as force or tactile to control the interaction with an environment; 3) planning sequences or [...]
Seeing the unseen: inferring unobserved information from multi-modal data
Abstract: As humans we can never fully observe the world around us and yet we are able to build remarkably useful models of it from our limited sensory data. Machine learning problems are often required to operate in a similar setup, that is the one of inferring unobserved information from the observed one. Partial observations [...]
Design and Analysis of Open-Source Educational Haptic Devices
Abstract: The sense of touch (haptics) is an active perceptual system used from our earliest days to discover the world around us. However, formal education is not designed to take advantage of this sensory modality. As a result, very little is known about the effects of using haptics in K-12 and higher education or the [...]
Towards AI for 3D Content Creation
Abstract: 3D content is key in several domains such as architecture, film, gaming, and robotics. However, creating 3D content can be very time consuming -- the artists need to sculpt high quality 3d assets, compose them into large worlds, and bring these worlds to life by writing behaviour models that "drives" the characters around in [...]
Move over, MSE! – New probabilistic models of motion
Abstract: Data-driven character animation holds great promise for games, film, virtual avatars and social robots. A "virtual AI actor" that moves in response to intuitive, high-level input could turn 3D animators into directors, instead of requiring them to laboriously pose the character for each frame of animation, as is the case today. However, the high [...]
Understanding the Placenta: Towards an Objective Pregnancy Screening
Abstract: My research focusses on the development of a pregnancy screening tool, that will be: (i) system and user-independent; and (ii) provides a quantifi able measure of placental health. With this end, I am working towards the design of a multiparametric quantitative ultrasound (QUS) based placental tissue characterization method. The method would potentially identify the [...]
Human-Robot Interactive Collaboration & Communication
Abstract: Autonomous and anthropomorphic robots are poised to play a critical role in manufacturing, healthcare and the services industry in the near future. However, for this vision to become a reality, robots need to efficiently communicate and interact with their human partners. Rather than traditional remote controls and programming languages, adaptive and transparent techniques for [...]
Carnegie Mellon University
Robots “R” Us: 25 years of Robotics Technology Development and Commercialization at NREC
Abstract: Since its founding in 1979, the Robotics Institute (RI) at Carnegie Mellon University has been leading the world in robotics research and education. In the mid 1990s, RI created NREC as the applied R&D center within the Institute with a specific mission to apply robotics technology in an impactful way on real-world applications. In this talk, I will go over [...]
Relational Reasoning for Multi-Agent Systems
Abstract: Multi-agent interacting systems are prevalent in the world, from purely physical systems to complicated social dynamics systems. The interactions between entities / components can give rise to very complex behavior patterns at the level of both individuals and the whole system. In many real-world multi-agent interacting systems (e.g., traffic participants, mobile robots, sports players), [...]
Towards an Intelligence Architecture for Human-Robot Teaming
Abstract: Advances in autonomy are enabling intelligent robotic systems to enter human-centric environments like factories, homes and workplaces. To be effective as a teammate, we expect robots to accomplish more than performing simplistic repetitive tasks; they must perceive, reason, perform semantic tasks in a human-like way. A robot's ability to act intelligently is fundamentally tied [...]
Self-supervised learning for visual recognition
Abstract: We are interested in learning visual representations that are discriminative for semantic image understanding tasks such as object classification, detection, and segmentation in images/videos. A common approach to obtain such features is to use supervised learning. However, this requires manual annotation of images, which is costly, ambiguous, and prone to errors. In contrast, self-supervised [...]
GANs for Everyone
Abstract: The power and promise of deep generative models such as StyleGAN, CycleGAN, and GauGAN lie in their ability to synthesize endless realistic, diverse, and novel content with user controls. Unfortunately, the creation and deployment of these large-scale models demand high-performance computing platforms, large-scale annotated datasets, and sophisticated knowledge of deep learning methods. This makes [...]
Reasoning over Text in Images for VQA and Captioning
Abstract: Text in images carries essential information for multimodal reasoning, such as VQA or image captioning. To enable machines to perceive and understand scene text and reason jointly with other modalities, 1) we collect the TextCaps dataset, which requires models to read and reason over text and visual content in the image to generate image [...]
Design and control of insect-scale bees and dog-scale quadrupeds
Abstract: Enhanced robot autonomy---whether it be in the context of extended tether-free flight of a 100mg insect-scale flapping-wing micro aerial vehicle (FWMAV), or long inspection routes for a quadrupedal robot---is hindered by fundamental constraints in power and computation. With this motivation, I will discuss a few projects I have worked on to circumvent these issues in [...]
Point Cloud Registration with or without Learning
Abstract: I will be presenting two of our recent works on 3D point cloud registration: A scene flow method for non-rigid registration: I will discuss our current method to recover scene flow from point clouds. Scene flow is the three-dimensional (3D) motion field of a scene, and it provides information about the spatial arrangement [...]
Dynamical Robots via Origami-Inspired Design
Abstract: Origami-inspired engineering produces structures with high strength-to-weight ratios and simultaneously lower manufacturing complexity. This reliable, customizable, cheap fabrication and component assembly technology is ideal for robotics applications in remote, rapid deployment scenarios that require platforms to be quickly produced, reconfigured, and deployed. Unfortunately, most examples of folded robots are appropriate only for small-scale, low-load [...]
Propelling Robot Manipulation of Unknown Objects using Learned Object Centric Models
Abstract: There is a growing interest in using data-driven methods to scale up manipulation capabilities of robots for handling a large variety of objects. Many of these methods are oblivious to the notion of objects and they learn monolithic policies from the whole scene in image space. As a result, they don’t generalize well to [...]
When and Why Does Contrastive Learning Work?
Abstract: Contrastive learning organizes data by pulling together related items and pushing apart everything else. These methods have become very popular but it's still not entirely clear when and why they work. I will share two ideas from our recent work. First, I will argue that contrastive learning is really about learning to forget. Different [...]
Anticipating the Future: forecasting the dynamics in multiple levels of abstraction
Abstract: A key navigational capability for autonomous agents is to predict the future locations, actions, and behaviors of other agents in the environment. This is particularly crucial for safety in the realm of autonomous vehicles and robots. However, many current approaches to navigation and control assume perfect perception and knowledge of the environment, even though [...]
Learning to Perceive Videos for Embodiment
Abstract: Video understanding has achieved tremendous success in computer vision tasks, such as action recognition, visual tracking, and visual representation learning. Recently, this success has gradually been converted into facilitating robots and embodied agents to interact with the environments. In this talk, I am going to introduce our recent efforts on extracting self-supervisory signals and [...]
Open Challenges in Sign Language Translation & Production
Abstract: Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses [...]
The Search for Ancient Life on Mars Began with a Safe Landing
Abstract: Prior mars rover missions have all landed in flat and smooth regions, but for the Mars 2020 mission, which is seeking signs of ancient life, this was no longer acceptable. To maximize the variety of rock samples that will eventually be returned to earth for analysis, the Perseverance rover needed to land in a [...]
3D Recognition with self-supervised learning and generic architectures
Abstract: Supervised learning relies on manual labeling which scales poorly with the number of tasks and data. Manual labeling is especially cumbersome for 3D recognition tasks such as detection and segmentation and thus most 3D datasets are surprisingly small compared to image or video datasets. 3D recognition methods are also fragmented based on the type [...]
Rapid Adaptation for Robot Learning
Abstract: How can we train a robot to generalize to diverse environments? This question underscores the holy grail of robot learning research because it is difficult to supervise an agent for all possible situations it can encounter in the future. We posit that the only way to guarantee such a generalization is to continually learn and [...]
Robotic Cave Exploration for Search, Science, and Survey
Abstract: Robotic cave exploration has the potential to create significant societal impact through facilitating search and rescue, in the fight against antibiotic resistance (science), and via mapping (survey). But many state-of-the-art approaches for active perception and autonomy in subterranean environments rely on disparate perceptual pipelines (e.g., pose estimation, occupancy modeling, hazard detection) that process the same underlying sensor data in [...]
Humans, hands, and horses: 3D reconstruction of articulated object categories using strong, weak, and self-supervision
Abstract: Reconstructing 3D objects from a single 2D image is a task that humans perform effortlessly, yet computer vision so far has only robustly solved 3D face reconstruction. In this talk we will see how we can extend the scope of monocular 3D reconstruction to more challenging, articulated categories such as human bodies, hands and [...]
Enabling Grounded Language Communication for Human-Robot Teaming
Abstract: The ability for robots to effectively understand natural language instructions and convey information about their observations and interactions with the physical world is highly dependent on the sophistication and fidelity of the robot’s representations of language, environment, and actions. As we progress towards more intelligent systems that perform a wider range of tasks in a [...]
Looking behind the Seen in Order to Anticipate
Abstract: Despite significant recent progress in computer vision and machine learning, personalized autonomous agents often still don’t participate robustly and safely across tasks in our environment. We think this is largely because they lack an ability to anticipate, which in turn is due to a missing understanding about what is happening behind the seen, i.e., [...]
Robots that Learn through Language
Abstract: Advances in perception have been integral to transitioning robots from machines restricted to factory automation to autonomous agents that operate robustly in unstructured environments. As our surrogates, robots enable people to explore the deepest depths of the ocean and distant regions of space, making discoveries that would otherwise be impossible. The age of robots [...]
Towards Reconstructing Any Object in 3D
Abstract: The world we live in is incredibly diverse, comprising of over 10k natural and man-made object categories. While the computer vision community has made impressive progress in classifying images from such diverse categories, the state-of-the-art 3D prediction systems are still limited to merely tens of object classes. A key reason for this stark difference [...]
The Clinician’s AI Partner: Augmenting Clinician Capabilities Across the Spectrum of Healthcare
Abstract: Clinicians often work under highly demanding conditions to deliver complex care to patients. As our aging population grows and care becomes increasingly complex, physicians and nurses are now also experiencing feelings of burnout at unprecedented levels. In this talk, I will discuss possibilities for computer vision to function as a partner to clinicians, and to augment their capabilities, across [...]
The Unusual Effectiveness of Abstractions for Assistive AI
Abstract: Can we balance efficiency and reliability while designing assistive AI systems? What would such AI systems need to provide? In this talk I will present some of our recent work addressing these questions. In particular, I will show that a few fundamental principles of abstraction are surprisingly effective in designing efficient and reliable AI [...]
Reliable and Accessible Visual Recognition
Abstract: As visual recognition models are developed across diverse applications; we need the ability to reliably deploy our systems in a variety of environments. At the same time, visual models tend to be trained and evaluated on a static set of curated and annotated data which only represents a subset of the world. In this [...]
Fake It Till You Make It: Face analysis in the wild using synthetic data alone
Abstract: In this seminar I will demonstrate how synthetic data alone can be used to perform face-related computer vision in the wild. The community has long enjoyed the benefits of synthesizing training data with graphics, but the domain gap between real and synthetic data has remained a problem, especially for human faces. Researchers have tried [...]
Robotics and Warehouse Automation at Berkshire Grey
Abstract: This talk tells the Berkshire Grey story, from its founding in 2013 to its IPO earlier this year — the first robotics IPO since iRobot over15 years ago. Berkshire Grey produces automated systems for e-commerce order fulfillment, parcel sortation, store replenishment, and related operations in warehouses, distribution centers, and in the back ends of [...]
Leveraging StyleGAN for Image Editing and Manipulation
Abstract: StyleGAN has recently been established as the state-of-the-art unconditional generator, synthesizing images of phenomenal realism and fidelity, particularly for human faces. With its rich semantic space, many works have attempted to understand and control StyleGAN’s latent representations with the goal of performing image manipulations. To perform manipulations on real images, however, one must learn to [...]
Resilient Exploration in SubT Environments: Team Explorer’s Approach and Lessons Learned in the Final Event
Abstract: Subterranean robot exploration is difficult with many mobility, communications, and navigation challenges that require an approach with a diverse set of systems, and reliable autonomy. While prior work has demonstrated partial successes in addressing the problem, here we convey a comprehensive approach to address the problem of subterranean exploration in a wide range of [...]
Next-Gen Video Communication
Abstract: Video communication connects our world. It is necessary in conducting business, educational and personal activities across different geographical locations. However, the quality of an average user’s video communication is dramatically worse than that of professionally created videos in news broadcasts, talk shows, and on YouTube. This is because professionally created videos are often captured with [...]
Activity Understanding of Scripted Performances
Abstract: The PSU Taichi for Smart Health project has been doing a deep-dive into vision-based analysis of 24-form Yang-style Taichi (TaijiQuan). A key property of Taichi, shared by martial arts katas and prearranged form exercises in other sports, is practice of a scripted routine to build both mental and physical competence. The scripted nature of routines [...]
Domain adaptive object detection
Abstract: Recent advances in deep learning have led to the development of accurate and efficient models for object detection. However, learning highly accurate models relies on the availability of large-scale annotated datasets. Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images. Domain adaptation tries to mitigate this degradation. In [...]