VASC Seminar
Learning 3D Reconstruction in Function Space
Virtual VASC Seminar: https://cmu.zoom.us/j/96635002737?pwd=RkxGVlJaUTlhcDdGeVBPcnpTS015dz09 Abstract: In this talk, I will show several recent results of my group on learning neural implicit 3D representations, departing from the traditional paradigm of representing 3D shapes explicitly using voxels, point clouds or meshes. Implicit representations have a small memory footprint and allow for modeling arbitrary 3D toplogies at [...]
Compositional Representations for Visual Recognition
Virtual VASC - https://cmu.zoom.us/j/99437689110?pwd=cWxuQkIwWlFFZEk0QkVDUVFiN0lTdz09 Abstract: Compositionality is the ability for a model to recognize a concept based on its parts or constituents. This ability is essential to use language effectively as there exists a very large combination of plausible objects, attributes, and actions in the world. We posit that visual recognition models should be [...]
Making 3D Predictions with 2D Supervision
Abstract: Building computer vision systems that understand 3D shape are important for applications including autonomous vehicles, graphics, and VR / AR. If we assume 3D shape supervision, we can now build systems that do a reasonable job at predicting 3D shapes from images. However, 3D supervision is difficult to obtain at scale; therefore we should [...]
Perceiving 3D Human-Object Spatial Arrangements from a Single Image In-the-wild
Abstract: We live in a 3D world that is dynamic—it is full of life, with inhabitants like people and animals who interact with their environment through moving their bodies. Capturing this complex world in 3D from images has a huge potential for many applications such as compelling mixed reality applications that can interact with people [...]
Detection of Photo Manipulation with Media Forensics
Abstract: Rapid progress in machine learning, computer vision and graphics leads to successive democratization of media manipulation capabilities. While convincing photo and video manipulation used to require substantial time and skill, modern editors bring (semi-) automated tools that can be used by everyone. Some of the most recent examples include manipulation of human faces, e.g., [...]
Advancing the State of the Art of Computer Vision for Billions of Users
Abstract: At Google, advancing the state of the art of computer vision is very impactful as there are billions of users of Google products, many of which require high-quality, artifact-free images. I will share what we learned from successfully launching core computer vision techniques for various Google products, including PhotoScan (Photos), seamless Google Street View [...]
Learning-based 6D Object Pose Estimation in Real-world Conditions
Abstract: Estimating the 6D pose, i.e., 3D rotation and 3D translation, of objects relative to the camera from a single input image has attracted great interest in the computer vision community. Recent works typically address this task by training a deep network to predict the 6D pose given an image as input. While effective on [...]
Deep Learning: (still) Not Robust
Abstract: One of the key limitations of deep learning is its inability to generalize to new domains. This talk studies recent attempts at increasing neural network robustness to both natural and adversarial distribution shifts. Robustness to adversarial examples, inputs crafted specifically to fool machine learning models, are arguably the most difficult type of domain shift. [...]
End-to-End ‘One Networks’: Learning Regularizers for Least Squares via Deep Neural Networks
Abstract: Linear Restoration Problems (or Linear Inverse Problems) involve reconstructing images or videos from noisy measurement vectors. Notable examples include denoising, inpainting, super-resolution, compressive sensing, deblurring and frame prediction. Often, multiple such tasks should be solved simultaneously, e.g., through Regularized Least Squares, where each individual problem is underdetermined (overcomplete) with infinitely many solutions from which [...]
Detecting Image Synthesis — Shallow and Deep
Abstract: The proliferation of synthetic media are subject to malicious usages such as disinformation campaigns, posing potential threats to media integrity and democracy. A way to combat this is developing forensics algorithms to identify manipulated media. In the beginning of the talk, I will discuss how one can train a model to detect photos manipulated [...]
Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening
Abstract: Breast cancer screening using the standard mammography exam currently exhibits a high false recall rate (11.6% for women in the U.S.). Only a low proportion (0.5%) of women who were recalled for additional workup were actually found to have breast cancer. As a result of the unnecessary stress and follow-up work from these false [...]
The Plenoptic Camera
Abstract: Imagine a futuristic version of Google Street View that could dial up any possible place in the world, at any possible time. Effectively, such a service would be a recording of the plenoptic function—the hypothetical function described by Adelson and Bergen that captures all light rays passing through space at all times. While the plenoptic function [...]
Photorealistic Reconstruction of Landmarks and People using Implicit Scene Representation
Abstract: Reconstructing scenes to synthesize novel views is a long standing problem in Computer Vision and Graphics. Recently, implicit scene representations have shown novel view synthesis results of unprecedented quality, like the ones of Neural Radiance Fields (NeRF), which use the weights of a multi-layer perceptron to model the volumetric density and color of a [...]
Towards Discriminative and Domain-Invariant Feature Learning
Abstract: Deep neural networks have achieved great success in various visual applications, when trained with large amounts of labeled in-domain data. However, the networks usually suffer from a heavy performance drop on the data whose distribution is quite different from the training one. Domain adaptation methods aim to deal with such performance gap caused by [...]
Learning Efficient Visual Representation on Model, Data, Label and Beyond
Abstract: Efficient deep learning is a broad concept that we aim to learn compressed deep models and develop training algorithms to improve the efficiency of model representations, data and label utilization, etc. In recent years, deep neural networks have been recognized as one of the most effective techniques for many learning tasks, also, in the [...]
Self-supervised Learning and Generalization
Abstract: Contrastive self-supervised learning is a highly effective way of learning representations that are useful for, i.e. generalise, to a wide range of downstream vision tasks and datasets. In the first part of the talk, I will present MoCHi, our recently published contrastive self-supervised learning approach (NeurIPS 2020) that is able to learn transferable representations [...]
Learning to see from few labels
Abstract: Computer vision systems today exhibit a rich and accurate understanding of the visual world, but increasingly rely on learning on large labeled datasets to do so. This reliance on large labeled datasets is a problem especially when one considers difficult perception tasks, or novel domains where annotations might require effort or expertise. We thus [...]
Seeing the unseen: inferring unobserved information from multi-modal data
Abstract: As humans we can never fully observe the world around us and yet we are able to build remarkably useful models of it from our limited sensory data. Machine learning problems are often required to operate in a similar setup, that is the one of inferring unobserved information from the observed one. Partial observations [...]
Towards AI for 3D Content Creation
Abstract: 3D content is key in several domains such as architecture, film, gaming, and robotics. However, creating 3D content can be very time consuming -- the artists need to sculpt high quality 3d assets, compose them into large worlds, and bring these worlds to life by writing behaviour models that "drives" the characters around in [...]
Understanding the Placenta: Towards an Objective Pregnancy Screening
Abstract: My research focusses on the development of a pregnancy screening tool, that will be: (i) system and user-independent; and (ii) provides a quantifi able measure of placental health. With this end, I am working towards the design of a multiparametric quantitative ultrasound (QUS) based placental tissue characterization method. The method would potentially identify the [...]
Relational Reasoning for Multi-Agent Systems
Abstract: Multi-agent interacting systems are prevalent in the world, from purely physical systems to complicated social dynamics systems. The interactions between entities / components can give rise to very complex behavior patterns at the level of both individuals and the whole system. In many real-world multi-agent interacting systems (e.g., traffic participants, mobile robots, sports players), [...]
Self-supervised learning for visual recognition
Abstract: We are interested in learning visual representations that are discriminative for semantic image understanding tasks such as object classification, detection, and segmentation in images/videos. A common approach to obtain such features is to use supervised learning. However, this requires manual annotation of images, which is costly, ambiguous, and prone to errors. In contrast, self-supervised [...]
Reasoning over Text in Images for VQA and Captioning
Abstract: Text in images carries essential information for multimodal reasoning, such as VQA or image captioning. To enable machines to perceive and understand scene text and reason jointly with other modalities, 1) we collect the TextCaps dataset, which requires models to read and reason over text and visual content in the image to generate image [...]
Point Cloud Registration with or without Learning
Abstract: I will be presenting two of our recent works on 3D point cloud registration: A scene flow method for non-rigid registration: I will discuss our current method to recover scene flow from point clouds. Scene flow is the three-dimensional (3D) motion field of a scene, and it provides information about the spatial arrangement [...]
Propelling Robot Manipulation of Unknown Objects using Learned Object Centric Models
Abstract: There is a growing interest in using data-driven methods to scale up manipulation capabilities of robots for handling a large variety of objects. Many of these methods are oblivious to the notion of objects and they learn monolithic policies from the whole scene in image space. As a result, they don’t generalize well to [...]
When and Why Does Contrastive Learning Work?
Abstract: Contrastive learning organizes data by pulling together related items and pushing apart everything else. These methods have become very popular but it's still not entirely clear when and why they work. I will share two ideas from our recent work. First, I will argue that contrastive learning is really about learning to forget. Different [...]
Anticipating the Future: forecasting the dynamics in multiple levels of abstraction
Abstract: A key navigational capability for autonomous agents is to predict the future locations, actions, and behaviors of other agents in the environment. This is particularly crucial for safety in the realm of autonomous vehicles and robots. However, many current approaches to navigation and control assume perfect perception and knowledge of the environment, even though [...]
Learning to Perceive Videos for Embodiment
Abstract: Video understanding has achieved tremendous success in computer vision tasks, such as action recognition, visual tracking, and visual representation learning. Recently, this success has gradually been converted into facilitating robots and embodied agents to interact with the environments. In this talk, I am going to introduce our recent efforts on extracting self-supervisory signals and [...]
Open Challenges in Sign Language Translation & Production
Abstract: Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses [...]
3D Recognition with self-supervised learning and generic architectures
Abstract: Supervised learning relies on manual labeling which scales poorly with the number of tasks and data. Manual labeling is especially cumbersome for 3D recognition tasks such as detection and segmentation and thus most 3D datasets are surprisingly small compared to image or video datasets. 3D recognition methods are also fragmented based on the type [...]
Rapid Adaptation for Robot Learning
Abstract: How can we train a robot to generalize to diverse environments? This question underscores the holy grail of robot learning research because it is difficult to supervise an agent for all possible situations it can encounter in the future. We posit that the only way to guarantee such a generalization is to continually learn and [...]
Humans, hands, and horses: 3D reconstruction of articulated object categories using strong, weak, and self-supervision
Abstract: Reconstructing 3D objects from a single 2D image is a task that humans perform effortlessly, yet computer vision so far has only robustly solved 3D face reconstruction. In this talk we will see how we can extend the scope of monocular 3D reconstruction to more challenging, articulated categories such as human bodies, hands and [...]
Looking behind the Seen in Order to Anticipate
Abstract: Despite significant recent progress in computer vision and machine learning, personalized autonomous agents often still don’t participate robustly and safely across tasks in our environment. We think this is largely because they lack an ability to anticipate, which in turn is due to a missing understanding about what is happening behind the seen, i.e., [...]
The Clinician’s AI Partner: Augmenting Clinician Capabilities Across the Spectrum of Healthcare
Abstract: Clinicians often work under highly demanding conditions to deliver complex care to patients. As our aging population grows and care becomes increasingly complex, physicians and nurses are now also experiencing feelings of burnout at unprecedented levels. In this talk, I will discuss possibilities for computer vision to function as a partner to clinicians, and to augment their capabilities, across [...]
Reliable and Accessible Visual Recognition
Abstract: As visual recognition models are developed across diverse applications; we need the ability to reliably deploy our systems in a variety of environments. At the same time, visual models tend to be trained and evaluated on a static set of curated and annotated data which only represents a subset of the world. In this [...]