Ryan Schmidt
Senior Principal Research Scientist, Autodesk

Design in context: bringing the physical world into CAD tools

NSH 1507

Ryan Schmidt is a Research Scientist and head of the Design & Fabrication Group at Autodesk Research in Toronto, Canada. He is the creator of several novel 3D design tools, including Meshmixer, which was acquired by Autodesk in 2011. At Autodesk he has evolved Meshmixer into one of the standard tools

C. Lawrence Zitnick
Principal Researcher
Microsoft Research

CANCELEDThe Depth of Our Understanding: Vision, Language, and Humor

NSH 1507

C. Lawrence Zitnick is a principal researcher in the Interactive Visual Media group at Microsoft Research, and is an affiliate associate professor at the University of Washington. He is interested in a broad range of topics related to visual object recognition. His current interests include object detection, semantically interpreting visual scenes,

Greg Shakhnarovich
Assistant Professor
Toyota Technical Institute at Chicago

Rich Representations for Parsing Visual Scenes

NSH 1507

Greg is an Assistant Professor at TTI-Chicago, a philanthropically endowed academic computer science institute located on the University of Chicago campus, where he works on computer vision and machine learning. He also holds a part-time faculty appointment at the University of Chicago Department of Computer Science. Prior to coming to TTI-Chicago,

Calvin Murdock
PhD Student
Machine Learning

Semantic Component Analysis

NSH 1507

Abstract: We propose a novel formulation for component analysis that allows for rich instance-level constraints that encourage semantic interpretability of the learned components. Even with simple features and intuitive spatial consistency priors, our method produces accurate, semantically-meaningful image segmentations both with and without supervision.

Xinlei Chen
PhD Student

Webly supervised learning of convolutional networks

NSH 1507

I am a PhD student at Language Technology Institute, Carnegie Mellon University, from fall 2012. I am working with Prof. Abhinav Gupta on joint learning with language and vision and life-long learning. I am also working with Prof. Tom Mitchell in CMU. Recently, I just finished my internship in MSR with

Carl Doersch
PhD Student,
Machine Learning

Unsupervised Visual Representation Learning by Context Prediction

NSH 1507

I'm a sixth year PhD Student in the Machine Learning Department at CMU, working with Alyosha Efros and Abhinav Gupta. I graduated from CMU in 2010 with a B.S. in computer science and cognitive science, with a minor in neural computation, completing an undergraduate thesis with Tai Sing Lee. I'm interested

Hanbyul Joo
PhD Student RI
Robotics Institute

Panoptic Studio: A Massively Multiview System for Social Motion Capture

NSH 1507

I am a Ph.D. student in the Robotics Institute at Carnegie Mellon University, under the supervision of Yaser Sheikh. During summer 2015, I interned at Disney Research Zurich where I worked with Thabo Beeler and Derek Bradley. Before joining CMU, I spent three years as a researcher at ETRI, a government-funded

Robert F. Murphy, Ph.D.
Lane Professor Of Computational Biology & Professor of Biological Sciences
Carnegie Mellon

Building models of cell organization, differentiation and perturbation directly from microscope images

NSH 1507

Dr. Robert F. Murphy is the Ray and Stephanie Lane Professor of Computational Biology and Head of the Computational Biology Department in the School of Computer Science at Carnegie Mellon University. He is also Professor of Biological Sciences, Biomedical Engineering, and Machine Learning at Carnegie Mellon, Honorary Professor of Biology at

Michael Ryoo
Assistant Professor
Indiana University Bloomington

Human Activity Recognition from a Robot’s Viewpoint

NSH 1507

Michael S. Ryoo is an Assistant Professor of the School of Informatics and Computing at Indiana University. His research interest is within the areas of Computer Vision and Human-Robot Interaction, with a particular emphasis on human activity recognition, first-person vision, and wearable/ubiquitous cameras. Before joining IU, Dr. Ryoo was a staff

Olga Russakovsky
Postdoctoral Fellow, RI
Carnegie Mellon

The human side of computer vision

NSH 1507

Olga Russakovsky ( is a postdoctoral research fellow at Carnegie Mellon University. She recently completed a PhD in computer science at Stanford advised by Prof. Fei-Fei Li. Her research is in computer vision, closely integrated with machine learning and human-computer interaction. She led the ImageNet Large Scale Visual Recognition Challenge effort

Robert Pless
Washington University

Brighter, Faster, Cheaper: Finding or Creating Light Fields for Visual Computing

NSH 1507

Robert Pless is a Professor of Computer Science and Engineering at Washington University in St. Louis, where he founded and directs the Media and Machines Lab. His research focus are big-data and geometric approaches to Visual Computing, with applications to social justice and environmental measurement. Dr. Pless has a Bachelors Degree

Genevieve Patterson
Computer Vision PhD Student
Brown University

Collective Insight: Crowd-driven Image Understanding

NSH 1507

Genevieve is a PhD Candidate in Computer Vision at Brown University. Her work on crowd-driven visual classification was recently awarded runner-up for Best Paper at the AAAI Conference on Human Computation (HCOMP). She built and maintains the SUN Attribute dataset, a widely used resource for scene understanding. Genevieve received her master's

David Fouhey
Ph.D. Student at the Robotics Institute
Carnegie Mellon University

Towards A Physical and Human-Centric Understanding of Images

Newell Simon Hall 1507

David Fouhey is a Ph.D. student at the Robotics Institute of Carnegie Mellon University, where he is advised by Abhinav Gupta and Martial Hebert. His research interests include computer vision and machine learning with a particular focus on scene understanding. David's work has been supported by both NSF and

Yu Xiang
Postdoctral Researcher
Stanford University

3D Object Representations for Recognition

NSH 1507

Yu Xiang is a Postdoctoral Researcher in the Computer Science Department at Stanford University. His research focuses on understanding objects and scenes from images and videos, with emphasis on recognizing both semantic and 3D geometric properties of objects and scenes. His current work attempts to develop 3D object representation and recognition

Changxi Zheng
Assistant Professor
Columbia University

Computational Acoustic Design: From the Virtual to the Real

Newell Simon Hall 1507

Changxi Zheng is an Assistant Professor of Computer Science at Columbia University. Prior to joining Columbia, he received his M.S. and Ph.D. from Cornell University, and his B.S. from Shanghai Jiaotong University. His research spans computer graphics, physically-based simulation, computational design, computational acoustics, scientific computing and robotics, with a

Saurabh Gupta
Graduate Student
University of California, Berkeley

Scene Understanding from RGB-D Images

1507 Newell Simon Hall

Saurabh Gupta is a Ph.D. student at UC Berkeley, where he is advised by Jitendra Malik. His research interests include computer vision and machine learning. During his PhD he has studied the problem of scene understanding from RGB-D images. His work has been supported by the Berkeley Fellowship and

Jianbo Shi
University of Pennsylvania

Inside-out: First Person Vision for Personalized Intelligence

Gates 2109

Jianbo Shi studied Computer Science and Mathematics as an undergraduate at Cornell University where he received his B.A. in 1994. He received his Ph.D. degree in Computer Science from University of California at Berkeley in 1998. He joined The Robotics Institute at Carnegie Mellon University in 1999 as a research faculty,

Matthias Niessner
Visiting Assistant Professor

Reconstruction and Understanding of Indoor Environments

Gates 8102

Matthias Niessner is a visiting assistant professor at Stanford University. Previous to his appointment at Stanford, he earned his PhD from the University of Erlangen-Nuremberg, Germany under the supervision of Günther Greiner. His research focuses on different fields of computer graphics and computer vision, including the reconstruction and semantic understanding of

Sanja Fidler
Assistant Professor
University of Toronto

Towards Understanding Stories from Videos

Newell Simon Hall 1507

Sanja Fidler is an Assistant Professor at the Department of Computer Science, University of Toronto. Previously she was a Research Assistant Professor at TTI-Chicago, a philanthropically endowed academic institute located in the campus of the University of Chicago. She was a postdoctoral fellow at University of Toronto during 2011-2012.

Andrew Owens
Ph.D. Student at MIT CSAIL
MIT - Massachusetts Institute of Technology

Sound provides supervision for visual learning

Newell Simon Hall 1507

Andrew Owens is a graduate student at the MIT Computer Science and Artificial Intelligence Laboratory, working under the supervision of Bill Freeman and Antonio Torralba. Before that, he obtained his B.A. in Computer Science at Cornell University in 2010. He is a recipient of a Microsoft Research PhD Fellowship,

Yezhou Yang
Postdoctoral Research Associate
University of Maryland, Institute for Advanced Computer Studies

Human Manipulation Action Understanding for Cognitive Robots

Newell Simon Hall 1507

Dr. Yezhou Yang is a Postdoctoral Research Associate at the Computer Vision Lab and the Automation, Robotics and Cognition (ARC) Lab, with the University of Maryland Institute for Advanced Computer Studies, working with his PhD advisors: Prof. Yiannis Aloimonos and Dr. Cornelia Fermuller. His main interests lie in Cognitive

Ed Johns
Dyson Fellow
Imperial College London

Deep Learning for Robot Manipulation via Simulation

Newell Simon Hall 1507

Ed Johns is a Dyson Fellow at Imperial College London, working on computer vision, robotics and machine learning. He received a BA and MEng from Cambridge University, followed by a PhD in visual recognition and localisation from Imperial College London. After post-doctoral work at University College London, he then

Xavier Alameda
Postdoctoral, Multimodal Human Understanding Group
University of Trento

Matrix Completion: A vision-oriented perspective

Newell Simon Hall 1507

Xavier Alameda-Pineda received the M.Sc. degree in mathematics and telecommunications engineering from the Universitat Politècnica de Catalunya – BarcelonaTech in 2008 and 2009 respectively, the M.Sc. degree in computer science from the Université Joseph Fourier and Grenoble INP in 2010, and the Ph.D. degree in mathematics/computer science from the

Karteek Alahari
Grenoble - Rhône-Alpes Center

What can we do with motion cues?

Newell Simon Hall 1507

Karteek Alahari is an Inria permanent researcher (chargé de recherche) since October 2015. He has been at Inria since 2010, initially as a postdoctoral fellow in the WILLOW team in Paris, and then on a starting research position in Grenoble since September 2013. Dr. Alahari's PhD from Oxford Brookes

Hyun Soo Park
Assistant Professor, the University of Minnesota
University of Minnesota

Understanding Social and Physical Interactions from First Person Cameras

Newell Simon Hall 1507

Hyun Soo Park is an Assistant Professor at the Department of Computer Science and Engineering, the University of Minnesota. He is interested in understanding human visual sensorimotor behaviors from first person cameras. Prior to the UMN, he was a Postdoctoral Fellow working with Jianbo Shi at University of Pennsylvania.

Tovi Grossman
Distinguished Research Scientist

Instrumented and Connected: Designing Next-Generation Learning Experiences

Event Location: Newell Simon Hall 1507Bio: Tovi Grossman is a Distinguished Research Scientist at Autodesk Research, located in downtown Toronto. Dr. Grossman’s research is in HCI, focusing on input and interaction with new technologies. In particular, he has been exploring how emerging technologies, such as wearables, the Internet of Things, and gamification can be leveraged [...]

Nobuyuki Umetani
Research Scientist
Autodesk Research

Simulation-guided Interactive Exploration of Functional Design

Event Location: Newell Simon Hall 1507Bio: Nobuyuki Umetani is a research scientist at Autodesk Research. Previously, he was a postdoctoral researcher in Autodesk Research and Disney Research Zurich. He received his Ph.D. degree in 2012 from The University of Tokyo under supervision of Takeo Igarashi. He also spent one year in Columbia University and in [...]

Min Xu
Assistant Research Professor
Carnegie Mellon University

Molecular resolution structural pattern mining inside single cells

Event Location: Newell Simon Hall 1507Bio: Dr. Min Xu is an Assistant Research Professor of Computational Biology at the Computational Biology Department in the School of Computer Science at Carnegie Mellon University. He received degrees in Computational Biology, Computer Science, and Applied Mathematics. He has more than 16 years of research experience in various Computational [...]

Jovan Popovic
Senior Principal Scientist
Adobe Research

Character Animator

Bio: Jovan Popovic is a Senior Principal Scientist at Adobe Systems. After receiving bachelor's degrees in mathematics and computer science in 1995, he attended the University of Washington and Carnegie Mellon University, where he earned a doctoral degree for his work in computer animation and geometric modeling. He was on the faculty at the Massachusetts [...]

Tali Dekel
Research Scientist
Google Inc.

Exploring and Modifying Spatial Variations in a Single Image

Event Location: Gates Hillman 5222Bio: Tali Dekel is currently a Research Scientist at Google, working on developing computer vision and computer graphics algorithms. Before Google, she was a Postdoctoral Associate at the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William T. Freeman. Tali completed her Ph.D studies at the school [...]

Larry Zitnick
Research Manager
Facebook AI Research

Reasoning About Our Visual World

Event Location: Newell Simon Hall 1507Bio: C. Lawrence Zitnick is a research manager at Facebook AI Research, and an affiliate associate professor at the University of Washington. He is interested in a broad range of topics related to artificial intelligence including object recognition, the relation of language and imagery, and methods for gathering common sense [...]

Julian Panetta
‎PhD Student
New York University

Fine-Scale Structure Design for 3D Printing

Event Location: Newell Simon Hall 1507Bio: Julian Panetta is a PhD candidate at NYU's Courant Institute, where he is advised by Denis Zorin. Julian is interested in simulation and optimal designproblems, specifically focusing on applications for 3D printing. Before joining NYU, he received his BS in computer science from Caltech and did research at NASA's [...]

Qixing Huang
Assistant Professor
University of Texas at Austin

Visual Correspondences in the Big Data Era

Event Location: Newell Simon Hall 1507Bio: Qixing Huang is an assistant professor at the University of Texas at Austin. He obtained his PhD in Computer Science from Stanford University and his MS and BSin Computer Science from Tsinghua University. He was a research assistant professor at Toyota Technological Institute at Chicago before joining UT Austin. [...]

Jiajun Wu
Graduate Student

Computational Perception of Geometric and Physical Object Properties

Event Location: Newell Simon Hall 1507Bio: Jiajun Wu is a third-year Ph.D. student at Massachusetts Institute of Technology, advised by Professor Bill Freeman and Professor Josh Tenenbaum. His research interests lie on the intersection of computer vision, machine learning, and computational cognitive science. Before coming to MIT, he received his B.Eng. from Tsinghua University, China, [...]

Me Car, You Human: Understanding Human Activity for Intelligent Collaborative Robotic Vehicles

Newell Simon Hall 1507

Eshed Ohn-Bar Postdoctoral Researcher, University of California, San Diego Abstract The goal of my research is to develop human-centered algorithms for intelligent and autonomous systems. The research emphasizes modeling the perception, intent, and behavior of humans inside and around a vehicle. Over a decade has passed since the DARPA Grand Challenges, and the way in [...]

Li, Yin
PhD Candidate Georgia Institute
Georgia Institute of Technology

Attention and Activities in First Person Vision

Event Location: Newell Simon Hall 1507Bio: Yin Li is currently a doctoral candidate in the School of Interactive Computing at the Georgia Institute of Technology. His research interests lie at the intersection of computer vision and mobile health. Specifically, he creates methods and systems to automatically analyze first person videos, known as First Person Vision [...]

TBA: Yin Li

Newell Simon Hall 1507

Dinesh Jayaraman
PhD Candidate
University of Texas at Austin

Embodied learning for visual recognition

Event Location: Gates 7101Bio: Dinesh Jayaraman is a PhD candidate in Kristen Grauman's group at UT Austin. His research interests are broadly in visual recognition and machine learning. In the last few years, Dinesh has worked on visual learning and active recognition in embodied agents, unsupervised representation learning from unlabeled video, visual attribute prediction, and [...]

Prof. Roberto Manduchi
Professor of Computer Engineering
University of California, Santa Cruz

Assistive technology for wayfinding, information access, and public transit

Event Location: Newell Simon Hall 1507Bio: Roberto Manduchi is a Professor of Computer Engineering at the University of California, Santa Cruz, where he conducts research in the areas of computer vision and sensor processing with applications to assistive technology. Prior to joining UCSC in 2001, he worked at the NASA Jet Propulsion Laboratory and at [...]

Pulkit Agrawal
PhD Student 
University of California Berkeley

Intuitive Physics & Intuitive Behavior 

Event Location: Newell Simon Hall 1507Bio: Pulkit is a PhD Student in the department of Computer Science at UC Berkeley. His research focuses on computer vision, robotics and computational neuroscience. He is advised by Dr. Jitendra Malik. Pulkit completed his bachelors in Electrical Engineering from IIT Kanpur and was awarded the Director’s Gold Medal. He is a recipient of Fulbright Science [...]

Dima Damen
Assistant Professor
University of Bristol, United Kingdom

The lifetime of an object – an object’s perspective onto interactions

Event Location: Newell Simon Hall 1507Bio: Lecturer (Assistant Professor) in Computer Vision at the University of Bristol. Received her PhD from the University of Leeds (2009). Dima's research interests are in the automatic understanding of object interactions, actions and activities using static and wearable visual (and depth) sensors. Dima co-chaired BMVC 2013, is area chair [...]

The lifetime of an object – an object’s perspective onto interactions

Newell Simon Hall 1507

Dima Damen Assistant Professor, University of Bristol, United Kingdom April 10, 2017, 3:00-4:00 p.m., Newell Simon Hall 1507 Abstract As opposed to the traditional notion of actions and activities in computer vision, where the motion (e.g. jumping) or the goal (e.g. cooking) is the focus, I will argue for an object-centred perspective onto actions and [...]

Computer Vision @ Scale

Gates 6115

Manohar Paluri Research Lead, Facebook Abstract Over the past 5 years the community has made significant strides in the field of Computer Vision. Thanks to large scale datasets, specialized computing in form of GPUs and many breakthroughs in modeling better convnet architectures Computer Vision systems in the wild at scale are becoming a reality. At [...]

Towards scaling video understanding

Newell Simon Hall 1507

Serena Yeung Ph.D. Student, Stanford University Abstract The quantity of video data is vast, yet our capabilities for visual recognition and understanding in videos lags significantly behind that for images. In this talk, I will discuss the challenges of scale in labeling, modeling, and inference behind this gap. I will then present three works addressing [...]

Haroon Idrees
Post Doc Associate
Center for Research in Computer Vision, University of Central Florida (UCF)

Visual Analysis of Dense Crowds

Event Location: Newell Simon Hall 1507Bio: Haroon Idrees is a postdoctoral researcher in the Center for Research in Computer Vision (CRCV) at the University of Central Florida (UCF). He is interested in machine vision and learning, with focus on crowd analysis, action recognition, multi-camera and airborne surveillance, as well as deep learning and multimedia content [...]

Haroon Idrees: Visual Analysis of Dense Crowds

Newell Simon Hall 1507

Haroon Idrees Post Doc Associate, Center for Research in Computer Vision, University of Central Florida (UCF) Abstract Automated analysis of dense crowds is a challenging problem with far-reaching applications in crowd safety and management, as well as gauging political significance of protests and demonstrations. In this talk, I will first describe a counting approach which [...]

Prof. Jia Deng
Assistant Professor
University of Michigan

Toward Deep Geometric Image Understanding

Event Location: Newell Simon Hall 1507Bio: Jia Deng is an Assistant Professor of Computer Science and Engineering at the University of Michigan. His research focus is on computer vision and machine learning, in particular, achieving human-level visual understanding by integrating perception, cognition, and learning. He received his Ph.D. from Princeton University and his B.Eng. from [...]

Roozbeh Mottaghi
Research Scientist
Allen Institute for Artificial Intelligence (AI2)

Interactive Scene Understanding

Newell Simon Hall 1507

Abstract Despite recent progress, AI is still far from understanding the physics of the world, and there is a large gap between the abilities of humans and the state-of-the-art AI methods. In this talk, I will focus on physics-based scene understanding and interactive visual reasoning, which are crucial next steps in computer vision and AI. [...]

Jose Miguel Buenaposada
Associate Professor
Rey Juan Carlos University, Spain

Multi-class Boosting in Computer Vision

GHC 6501

Abstract: Boosting classifiers have been extensively used for learning multi-view single objects detectors (e.g. faces, cars or pedestrians) or in multiple object categories detectors. Object detection has been evolving from being specific for a given object category to multi-view or even being able to detect multiple categories at the same time. The usual framework for [...]

Yoshinori Dobashi
Associate Professor
Hokkaido University, Japan

Fun with Fluids

GHC 6501

  Abstract: Visual simulation of fluids has become an indispensable tool for computer graphics. Many fluid phenomena can be simulated by solving Navier-Stokes equations. In computer graphics, the NS equations are mostly used for simulating smoke, water and fire. However, it is useful for other different purposes. In this talk, we show our usage of [...]

Assistant Research Professor
Robotics Institute,
Carnegie Mellon University

Challenges Facing Computational Face

GHC 6501

Abstract: Recent advances in computational face research make possible a growing range of scientific, behavioral, and commercial applications. Many companies are focusing on the future of computational face products and services, but number of critical research questions remain to be solved. These include 3D face alignment from 2D image, face analysis under extreme pose variation [...]

Ben Burchfiel
PhD Candidate

Bayesian Eigenobjects: A Unified Framework for 3D Robot Perception

GHC 6501

  Abstract: Robot-object interaction requires several key perceptual building blocks including object pose estimation, object classification, and partial-object completion. These tasks form the perceptual foundation for many higher level operations including object manipulation and world-state estimation. Most existing approaches to these problems in the context of 3D robot perception assume an existing database of objects [...]

Laurens van der Maaten
Research Scientist
Facebook AI Research

Two Tales about Image Classification

GHC 6501

Abstract: This talk tells two tales about image-classification systems, both of which are motivated by the real-world deployment of such systems. The first tale introduces a new convolutional neural network architecture, called multi-scale DenseNets, with the ability to adapt dynamically to computational resource limits at inference time. The network uses progressively growing multi-scale convolutions, dense [...]

Hongdong Li
Reader/Associate Professor
Australian National University

Dense 3D Shape Reconstruction of Complex Dynamic Scene with a Single Monocular Camera 

GHC 6501

Abstract: In this talk, I will describe our recent work (presented at ICCV 2017) on monocular camera based 3D geometry reconstruction of a non-rigid dynamic scene.   We aim to answer an open question in multi-view geometry, namely, "Is it possible to recover the 3D structure of a complex dynamic environment from two image frames captured by [...]

Larry Zitnick
Research Lead
Facebook AI Research

Learning to Visually Reason

GHC 6115

Abstract: Visual reasoning is a core capability of artificial intelligence. It is a necessity for effective communication, planning, and for question/answering tasks. In this talk, I discuss some recent explorations into visual reasoning for question/answering, game playing and dialog. I also describe our new reinforcement learning platform ELF; an Extensive, Lightweight and Flexible research platform [...]

James Davidson
Software Engineer
Google Brain Robotics

Towards Lifelong Robot Learning

GHC 6501

Abstract: Google Brain Robotics vision is to leverage learning to push the field of robotics forward. As such, we have engaged in research ranging in application from navigation to grasping and approach from deep RL to learning from demonstration. Fundamentally, our research is built around the core idea of lifelong learning. Our long term goal [...]

Zach Pezzementi
Lead Robotics Engineer
Carnegie Mellon University / NREC

Comparing apples and oranges: Off-road pedestrian detection on the NREC agricultural person-detection dataset

GHC 6501

Abstract: Person detection from vehicles has made rapid progress recently with the advent of multiple high-quality datasets of urban and highway driving, yet no large-scale benchmark has been available for the same problem in off-road or agricultural environments. In this talk, we present the NREC Agricultural Person-Detection Dataset to spur research in these environments. It [...]

Debadeepta Dey
Microsoft Research AI (MSR AI)

Adaptive Information Gathering via Imitation Learning

GHC 6501

Abstract: In the adaptive information gathering problem, a robot is required to select an informative sensing location using the history of measurements acquired thus far. While there is an extensive amount of prior work investigating effective practical approximations using variants of Shannon’s entropy, the efficacy of such policies heavily depends on the geometric distribution of [...]

Shubham Tulsiani
PhD Candidate
UC, Berkeley

Learning Single-view 3D Reconstruction of Objects and Scenes

GHC 6501

Abstract: In this talk, I will discuss the task of inferring 3D structure underlying an image, in particular focusing on two questions - a) how we can plausibly obtain supervisory signal for this task, and b) what forms of representation should we pursue. I will first show that we can leverage image-based supervision to learn [...]

Ryad Benosman
University Pierre and Marie Curie, Paris

Neuromorphic Event-based time oriented vision and Computation

GHC 6501

Abstract: There has been significant research over the past two decades in developing new systems for spiking neural computation. The impact of neuromorphic concepts on recent developments in optical sensing, display and artificial vision is presented. State-of-the-art image sensors suffer from severe limitations imposed by their very principle of operation. These sensors acquire the visual [...]

Stefan Lee
Research Scientist
School of Interactive Computing at Georgia Tech

Towards Goal-Driven Visually Grounded Dialog Agents

Newell-Simon Hall 3305

Abstract: Communication between human users and artificial intelligences is essential for human-AI cooperative tasks. For these collaborations to extend into real environments, artificial agents must be able to perceive their environment (visually, aurally, tactilely, etc.) and to communicate with humans about it in order to accomplish mutual goals. For example, a user might talk with [...]

Lihi Zelnik-Manor
Associate Professor in the Faculty of Electrical Engineering
Technion, Israel

On challenges in image generation

Newell-Simon Hall 3305

Abstract: Recent work has shown impressive success in automatically synthesizing new images with desired properties such as transferring painterly style, modifying facial expressions, increasing image resolution or manipulating the center of attention of the image. In this talk I will discuss two of the standing challenges in image synthesis and how we tackle them: - [...]

Oren Etzioni
Allen Institute for Artificial Intelligence

Learning Common Sense: a Grand Challenge for Academic AI Research

GHC 6115

Abstract: In a world where Google, Facebook, and others possess massive proprietary data sets, and unprecedented computational power---how is a graduate student to make a dent in the universe? I’ll address this conundrum by re-visiting one of the holy grails of AI: acquiring, representing, and utilizing common-sense knowledge. Can we leverage modern methods including deep [...]

Albert Ali Salah
Associate Professor
Boğaziçi University, Turkey

Multimodal, multilevel analysis of human behavior

Newell-Simon Hall 3305

Abstract: Computer analysis of human behavior is an interdisciplinary endeavor combining sensing technology, theoretical and empirical models of human behavior, pattern recognition and machine learning algorithms, and interaction sciences. The applications in this area range widely, from robotics to healthcare, from smart environments to multimedia, from security to humanitarian response. While human behaviors span different [...]

Burak Uzkent
Computer Vision Engineer
Planet Labs

Object Detection and Tracking on Low Resolution Aerial Images

Newell-Simon Hall 3305

Abstract:  Object tracking from an aerial platform poses a number of unique challenges including the small number of pixels representing the objects, large camera motion, and low temporal resolution. Because of these unique reasons, low resolution aerial image analysis needs to be tackled differently than the traditional image analysis both in terms of the sensors, [...]

Stella Yu
Director, ICSI Vision & Senior Fellow, Berkeley Institute for Data Science
University of California, Berkeley

Data-Driven Learning Towards Perceptual Organization

GHC 6501

Abstract: Computer vision has advanced rapidly with deep learning, achieving above human performance on some classification benchmarks. At the core of the state-of-the-art approaches for image classification, object detection, and semantic/instance segmentation is sliding window classification, engineered for computational efficiency. Such piecemeal analysis of visual perception often has trouble getting details right and fails miserably [...]

Saining Xie
Ph.D. Candidate
Computer Science, UC San Diego

Deep Representation Learning with Induced Structural Priors

Gates 6115

Abstract: With the support of big-data and big-compute, deep learning has reshaped the landscape of research and applications in artificial intelligence. Whilst traditional hand-guided feature engineering in many cases is simplified, the deep network architectures become increasingly more complex. A central question is if we can distill the minimal set of structural priors that can [...]

Deepak Pathak
Ph.D. Candidate
Computer Science at UC Berkeley

Lifelong Learning via Curiosity and Self-supervision

GHC 6501

Abstract: Humans demonstrate remarkable ability to generalize their knowledge and skills to new unseen scenarios. One of the primary reasons is that they are continually learning by acting in the environment and adapting to novel circumstances. This is in sharp contrast to our current machine learning algorithms which are incredibly narrow in only performing the [...]

Gerard Pons-Moll
Research Group Leader
Max Planck for Informatics, Saarland Informatics Campus

Capturing and Learning Digital Humans

GHC 6501

Abstract: The world is shifting towards a digitization of everything -- music, books, movies and news in digital form are common in our everyday lives. Digitizing human beings would redefine the way we think and communicate (with other humans and with machines), and it is necessary for many applications; for example, to transport people into virtual and augmented reality, [...]

Iasonas Kokkinos
Research Scientist
Facebook AI Research

Deformable models meet deep learning: supervised and unsupervised approaches

GHC 6501

Abstract: In this talk I will be presenting recent work on combining ideas from deformable models with deep learning. I will start by describing DenseReg and DensePose, two recently introduced systems for establishing dense correspondences between 2D images and 3D surface models ``in the wild'', namely in the presence of background, occlusions, and multiple objects. [...]

Yuandong Tian
Research Scientist & Manager
Facebook AI Research

Building Scalable Framework and Environment of Reinforcement Learning

GHC 6501

Abstract: Deep Reinforcement Learning (DRL) has made strong progress in many tasks that are traditionally considered to be difficult, such as complete information games, navigation, architecture search, etc. Although the basic principle of DRL is quite simple and straightforward, to make it work often requires substantially more samples with more computational resource, compared to traditional [...]

Byeong Keun Kang
Ph.D. Candidate
UC San Diego

Scene Understanding

GHC 6501

Abstract: Accurate and efficient scene understanding is a fundamental task in a variety of computer vision applications including autonomous driving, human-machine interaction, and robot navigation. Reducing computational complexity and memory use is important to minimize response time and power consumption for portable devices such as robots and virtual/augmented devices. Also, it is beneficial for vehicles [...]

Shervin Ardeshir
Ph.D. Candidate
University of Central Florida

Relating First-person and Third-person Videos

GHC 6501

Abstract: Thanks to the availability and increasing popularity of wearable devices such as GoPro cameras, smart phones and glasses, we have access to a plethora of videos captured from the first person perspective. Capturing the world from the perspective of one's self, egocentric videos bear characteristics distinct from the more traditional third-person (exocentric) videos. In [...]

Emily Denton
Ph.D. Student
Courant Institute at New York University

Towards better methods of video generation

Gates-Hillman 6115

Abstract: Learning to generate future frames of a video sequence is a challenging research problem with great relevance to reinforcement learning, planning and robotics. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this talk I will address two important aspects of video generations: (i) what [...]

Fereshteh Sadeghi
PhD Candidate
Computer Science, University of Washington

Acquiring and Transferring Generalizable Vision-based Robot Skills

GHC 6501

Abstract:  In recent years, there have been great advances in policy learning for goal-oriented agents. However, there are still major challenges brought by real-world constraints for teaching highly generalizable and versatile robot policies in a cost efficient and safe manner. In this talk, I will argue that instead of aiming to teach large motion repertoires [...]

Yong Jae Lee
Assistant Professor
Computer Science Department, University of California, Davis

Learning to localize and anonymize objects with indirect supervision

GHC 6501

Abstract: Computer vision has made great strides for problems that can be learned with direct supervision, in which the goal can be precisely defined (e.g., drawing a box that tightly-fits an object). However, direct supervision is often not only costly, but also challenging to obtain when the goal is more ambiguous. In this talk, I [...]

Philipp Krähenbühl
Computer Science Department, University of Texas at Austin

Video Compression for Recognition & Video Recognition for Compression

GHC 6501

Abstract: Training robust deep video representations has proven to be much more challenging than learning deep image representations. One reason is: videos are huge and highly redundant. The 'true' and interesting signal often drowns in too much irrelevant data. In the first part of the talk, I will show how to train a deep network [...]

Aljosa Osep
M.Sc. Computer Science
RWTH Aachen University, Computer Vision Group

Tracking Beyond Detection

GHC 6501

Abstract:  The majority of existing vision-based methods perform multi-object tracking in the image domain. Yet, in mobile robotics and autonomous driving scenarios, pixel-precise object localization and trajectory estimation in 3D space are of fundamental importance. Furthermore, the leading paradigms for vision-based multi-object tracking and trajectory prediction heavily rely on object detectors and effectively limit tracking [...]

Yuval Bahat
Technion - Israel Institute of Technology

Exploiting Deviations from Ideal Visual Recurrence

1305 Newell Simon Hall

Abstract: Visual repetitions are abundant in our surrounding physical world: small image patches tend to reoccur within a natural image, and across different rescaled versions thereof. Similarly, semantic repetitions appear naturally inside an object class within image datasets, as a result of different views and scales of the same object. We studied deviations from these [...]

Shu Kong
PhD Candidate
University of California at Irvine

Attending to Pixels, Embedding Pixels, Predicting Pixels

1305 Newell Simon Hall

Abstract: Nowadays splashy applications heavily depend on meticulously annotated datasets, data-driven and learning-based methods, among which pixel labeling plays an important role yet often lacks interpretability. In this talk, I will discuss how we deal with pixels with better interpretability. Firstly, I'll introduce the pixel embedding framework that allows for clustering pixels into discrete groups [...]

Erik Learned-Miller
University of Massachusetts, Amherst

Automatically Supervised Learning: Two more steps on a long journey

1305 Newell Simon Hall

Abstract: I will talk about two recent pieces of work that attempt to move towards learning with less reliance on labeled data. In the first, part, I will talk about how the surrogate task of predicting the motion of objects can induce complex representations in neural networks without any labeled data.  In the second part of [...]

Francesc Moreno Noguer
Associate Researcher
Institut de Robotica i Informatica Industrial (Barcelona, Spain)

Geometric Deep Learning for Perceiving and Modeling Humans

GHC 6501

Abstract: Perceiving and modeling shape and appearance of the human body from single images is a severely under-constrained problem that not only requires large volumes of data, but also prior knowledge.  In this talk I will present recent solutions on how deep learning can leverage on geometric reasoning to address tasks like 3D estimation of [...]

Wenshuo Wang
Postdoctoral Research Associate
Safe AI Lab, Carnegie Mellon University

Human-Level Learning of Driving Primitives through Bayesian Nonparametric Statistics

Gates-Hillman Center 8102

Abstract: Understanding and imitating human driver behavior has benefited for autonomous driving in terms of perception, control, and decision-making. However, the complexity of multi-vehicle interaction behavior is far messier than human beings can cope with because of the limited prior knowledge and capability of dealing with high-dimensional and large-scale sequential data. In this talk, I [...]

Hironobu Fujiyoshi
Chubu University (Japan)

Knowledge Transfer Graph for Deep Collaborative Learning

3305 Newell-Simon Hall

Abstract:  In this talk I will present our latest research about knowledge transfer graph for Deep Collaborative Learning (DCL), which is a method that incorporates Knowledge Distillation and Deep Mutual Learning. DCL is represented by a directional graph where each model is represented by a node, and the propagation of knowledge from the source node to the [...]

Fuxin Li
Assistant Professor
Oregon State University

Some New Designs of Convolutional and Recurrent Networks

GHC 6501

Abstract: Convolutional networks (CNNs) and recurrent networks have driven the great engineering success of deep learning in recent years. However, as academics, we still wonder whether they are indeed the ultimate models of choice. Especially, CNNs seem unable to characterize predictive uncertainty, and they are highly dependent on small filters on small, rectangular neighborhoods. On [...]

Arthur Szlam
Research Scientist
Facebook AI Research

Language and Interaction in Minecraft

GHC 6501

Abstract:  I will discuss a research program aimed at building a Minecraft assistant, in order to facilitate the study of agents that can complete tasks specified by dialogue, and eventually, to learn from dialogue interactions.  I will describe the tools and platform we have built allowing players to interact with the agents and to record those interactions, and [...]

Minh Hoai Nguyen
Assistant Professor
Stony Brook University

Attentive Human Action Recognition

Gates-Hillman Center 8102

Abstract:  Enabling computers to recognize human actions in video has the potential to revolutionize many areas that benefit society such as clinical diagnosis, human-computer interaction, and social robotics. Human action recognition, however, is tremendously challenging for computers due to the subtlety of human actions and the complexity of video data. Critical to the success of [...]

Xiaodong Yang
Principle Scientist

Temporal Modeling and Data Synthesis for Visual Understanding

GHC 6501

Abstract: In this talk, I will present two recent pieces of work on leveraging temporal information and synthetic data to enhance video and image understanding. In the first part, I will introduce a progressive learning framework, Spatio-TEmporalProgressive (STEP), for action detection in videos. STEP is able to more effectively make use of longer temporal information, [...]

Shih-En Wei
Research Scientist
Facebook Reality Labs

VR facial animation via multiview image translation

GHC 6501

Abstract:  A key promise of Virtual Reality (VR) is the possibility of remote social interaction that is more immersive than any prior telecommunication media. However, existing social VR experiences are mediated by inauthentic digital representations of the user (i.e., stylized avatars). These stylized representations have limited the adoption of social VR applications in precisely those [...]

Stephen Lombardi
Research Scientist
Facebook Reality Labs

Neural Volumes: Learning Dynamic Renderable Volumes from Images

GHC 6501

Abstract:   Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We [...]

Franziska Mueller
M.Sc. (Doctoral Candidate)
Max Planck Institute for Informatics

Towards Lightweight Real-time Hand Reconstruction in Challenging

GHC 6501

Abstract: Humans naturally use their hands to interact and communicate with their surroundings. Reconstructing these complex and dexterous hand interactions enables sign-language recognition and translation, better assistive robots, and more immersive human-computer interaction (e.g. for AR and VR). To make hand reconstruction usable for the aforementioned applications and to a wide set of users, the [...]

Madalina Fiterau
Assistant Professor
UMass Amherst,College of Information and Computer Sciences

Hybrid Methods for the Integration of Heterogeneous Multimodal Biomedical Data

GHC 6501

Abstract:  The prevalence of smartphones and wearable devices for health monitoring and widespread use of electronic health records have led to a surge in heterogeneous multimodal healthcare data, collected at an unprecedented scale. My research focuses on developing machine learning techniques that learn salient representations of multimodal, heterogeneous data for biomedical predictive models. The first [...]

Carlos Vallespi
Staff Engineer and Technical Lead Manager
Uber ATG

Self-Driving Cars & AI: Transforming our Cities and our Lives

GHC 6501

Abstract:  Recent algorithmic and hardware improvements resulted in several success stories in the field of Artificial Intelligence (AI) which impact our daily lives. However, despite its ubiquity, AI is only just starting to make advances in what may arguably have the largest societal impact thus far, the nascent field of autonomous driving. At Uber ATG, [...]

Larry Zitnick
Research Scientist
Facebook AI Research

Go, fastMRI, and Minecraft: Exploring the limits of AI

GHC 6501

Abstract: The application of AI across various domains demonstrates both the promise of existing techniques but also their limitations. In this talk, I explore three recent projects and how they shed light on the progress of AI and the challenges to come. These projects include ELF OpenGo a reimplementation of AlphaZero, fastMRI for reducing the time [...]

Zhiding Yu
Research Scientist
NVIDIA Research

Towards Weakly-Supervised Visual Understanding

GHC 6501

Abstract:  Learning with weak and self-supervisions recently emerged as compelling tools towards leveraging vast amounts of unlabeled or partially-labeled data. In this talk, I will present some of the latest advances in weakly-supervised visual scene understanding from NVIDIA. Specifically, I will summarize and discuss some challenges and potential solutions in weakly-supervised learning, and introduce our [...]

Vivek Boominathan
Postdoctoral Researcher
Rice University

Imaging without focusing: A computational approach to miniaturizing cameras

3305 Newell-Simon Hall

Abstract:  Miniaturization of cameras is key to enabling new applications in areas such as connected devices, wearables, implantable medical devices, in vivo microscopy, and micro-robotics. Recently, lenses were identified as the main bottleneck in miniaturization of cameras. Standard smaller lens-system camera modules have a thickness of about 10 mm or higher, and reducing the size [...]

Pablo Garrido
Research Scientist
Epic Games

Towards photo-realistic face digitization from monocular videos

GHC 6501

Abstract:  Recent advances in face capture now enable digitizing high-quality 3D faces for the entertainment industry. Standardized digitization solutions, however, require tailor-made capture systems and extensive manual work, making them expensive and hard to deploy. With the advent of commodity sensors, new lightweight approaches that push the boundaries of human digitization have been introduced, slowly [...]

Thiemo Alldieck
PhD Candidate
Facebook Reality Labs

Reconstructing 3D Human Avatars from Monocular Images

GHC 6501

Abstract:  Statistical 3D human body models have helped us to better understand human shape and motion and already enabled exciting new applications. However, if we want to learn detailed, personalized, and clothed models of human shape, motion, and dynamics, we require new approaches that learn from ubiquitous data such as plain RGB-images and video. I [...]

Adriana Kovashka
Assistant Professor
University of Pittsburgh

Reasoning about complex media from weak multi-modal supervision

GHC 6501

Abstract:  In a world of abundant information targeting multiple senses, and increasingly powerful media, we need new mechanisms to model content. Techniques for representing individual channels, such as visual data or textual data, have greatly improved, and some techniques exist to model the relationship between channels that are “mirror images” of each other and contain [...]

Benjamin Schmidt
President and Co-Founder

Building Trust in Real World Applications of Vision Based Machine Learning

GHC 6501

Abstract:  In all machine learning problems, there is an explicit trade off between cost and benefit. In real world vision problems, this optimization becomes increasingly difficult since those trade offs directly impact technology and product development as well as business strategy. For any successful business case, it is critical that the cost/benefit trade offs in [...]

Partha Pratim Talukdar
Associate Professor
IIScBangalore / Founder, KENOME

Knowledge Infused Deep Learning

Newell-Simon Hall 4305

Abstract:  This talk is motivated by the following thesis: Background knowledge is key to intelligent decision making. While deep learning methods have made significant strides over the last few years, they often lack the context in which they operate. Knowledge Graphs (and more generally multi-relational graphs) provide a flexible framework to capture and represent knowledge [...]

Georgios Pavlakos
PhD Student
University of Pennsylvania

Learning to Reconstruct 3D Humans

GHC 6501

Abstract:  Recent advances in 2D perception have led to very successful systems, able to estimate the 2D pose of humans with impressive robustness. However, our interactions with the world are fundamentally 3D, so to be able to understand, explain and predict these interactions, it is crucial to reconstruct people in 3D. In this talk, I [...]

Xingyu Liu
PhD Student
Stanford University

Deep Learning for Understanding Dynamic Visual Data

GHC 6501

Abstract:  Perceiving dynamic environments from visual inputs allows autonomous agents to understand and interact with the world and is a core topic in Artificial Intelligence. The success of deep learning motivates us to apply deep learning techniques to the perception of dynamic visual data. However, how to design and apply deep neural networks to effectively [...]

James Hays
Associate Professor
Georgia Institute of Technology

Analyzing Grasp Contact via Thermal Imaging

GHC 6501

Abstract:  Grasping and manipulating objects is an important human skill. Because contact between hand and object is fundamental to grasping, measuring it can lead to important insights. However, observing contact through external sensors is challenging because of occlusion and the complexity of the human hand. I will discuss the use of thermal cameras to capture [...]

Sanjeev J. Koppal
Assistant Professor
University of Florida

Fast Foveation for LIDARs, Projectors and Cameras

GHC 6501

Abstract:  Most cameras today capture images without considering scene content. In contrast, animal eyes have fast mechanical movements that control how the scene is imaged in detail by the fovea, where visual acuity is highest. This concentrates computational (i.e. neuronal) resources in places where they are most needed. The prevalence of foveation, and the wide [...]

Jia-Bin Huang
Assistant Professor
Virginia Tech

Learning to See Through Occlusions and Obstructions

Virtual VASC:   Abstract:  Photography allows us to capture and share memorable moments of our lives. However, 2D images appear flat due to the lack of depth perception and may suffer from poor imaging conditions such as taking photos through reflecting or occluding elements. In this talk, I will present our recent efforts to [...]

Yuxin Wu
Research Engineer
Facebook AI Research

Detectron2 in Object Detection Research

Virtual VASC:   Abstract:  Detectron2 is Facebook's library for object detection and segmentation. It has been used widely in FAIR's research and Facebook's products. This talk will introduce detectron2 with a focus on its use in object detection research, including the lessons we learned from building it, as well as the new research enabled [...]

Olga Russakovsky
Assistant Professor
Department of Computer Science, Princeton University

Fairness in visual recognition

Virtual VASC Seminar:   Abstract: Computer vision models trained on unparalleled amounts of data hold promise for making impartial, well-informed decisions in a variety of applications. However, more and more historical societal biases are making their way into these seemingly innocuous systems. Visual recognition models have exhibited bias by inappropriately correlating age, gender, sexual [...]

Qi Guo
PhD Student
Harvard University

Bio-inspired depth sensing using computational optics

Virtual Seminar:   Abstract:  Jumping spiders rely on accurate depth perception for predation and navigation. They accomplish depth perception, despite their tiny brains, by using specialized optics. Each principal eye includes a multitiered retina that simultaneously receives multiple images with different amounts of defocus, and distance is decoded from these images with seemingly little [...]

Gemma Roig
Assistant Professor
Department of Computer Science, Goethe University Frankfurt

Task-specific Vision DNN Models and Their Relation for Explaining Different Areas of the Visual Cortex

Virtual VASC Seminar:   Abstract:  Deep Neural Networks (DNNs) are state-of-the-art models for many vision tasks. We propose an approach to assess the relationship between visual tasks and their task-specific models. Our method uses Representation Similarity Analysis (RSA), which is commonly used to find a correlation between neuronal responses from brain data and models. [...]

Cristian Sminchisescu
Research Scientist / Professor
Google / Lund University

End-to-end Generative 3D Human Shape and Pose Models and Active Human Sensing

Virtual VASC Seminar: Title:  End-to-end Generative 3D Human Shape and Pose Models and Active Human Sensing Abstract:  I will review some of our recent work in 3d human modeling, synthesis, and active vision. I will present our new, end-to-end trainable nonlinear statistical 3d human shape and pose models of different resolutions (GHUM and GHUMLite) as [...]

Bryan Russell
Senior Research Scientist
Adobe Research

Telling Left from Right: Learning Spatial Correspondence Between Sight and Sound

Virtual VASC Seminar: Abstract:  Self-supervised audio-visual learning aims to capture useful representations of video by leveraging correspondences between visual and audio inputs. Existing approaches have focused primarily on matching semantic information between the sensory streams. In my talk, I’ll describe a novel self-supervised task to leverage an orthogonal principle: matching spatial information in the [...]

Ciprian Corneanu
Research Assistant
Tawny GmbH, University of Barcelona

The Topology of Learning

Zoom Virtual Meeting:   Abstract: Deep Neural Networks (DNNs) have revolutionized computer vision. We now have DNNs that achieve top results in many computer vision problems, including object recognition, facial expression analysis, and semantic segmentation, to name but a few. Unfortunately, the rise in performance has come with a cost.  DNNs have become so [...]

Vincent Sitzmann

Implicit Neural Scene Representations

Virtual Zoom Seminar:   Abstract How we represent signals has major implications for the algorithms we build to analyze them. Today, most signals are represented discretely: Images as grids of pixels, shapes as point clouds, audio as grids of amplitudes, etc. If images weren't pixel grids - would we be using convolutional neural networks [...]

Ashok Veeraraghavan
Professor of Electrical and Computer Engineering
Rice University, Houston TX

Computational Imaging: Beyond the Limits Imposed by Lenses

Virtual VASC Seminar:   Abstract: The lens has long been a central element of cameras, since its early use in the mid-nineteenth century by Niepce, Talbot, and Daguerre. The role of the lens, from the Daguerrotype to modern digital cameras, is to refract light to achieve a one-to-one mapping between a point in the scene and a point on the sensor. This effect enables the sensor to compute a particular two-dimensional (2D) [...]

Andreas Geiger
University of Tübingen

Learning 3D Reconstruction in Function Space

Virtual VASC Seminar:   Abstract: In this talk, I will show several recent results of my group on learning neural implicit 3D representations, departing from the traditional paradigm of representing 3D shapes explicitly using voxels, point clouds or meshes. Implicit representations have a small memory footprint and allow for modeling arbitrary 3D toplogies at [...]

Vicente Ordónez-Román
Assistant Professor
University of Virginia

Compositional Representations for Visual Recognition

Virtual VASC -   Abstract: Compositionality is the ability for a model to recognize a concept based on its parts or constituents. This ability is essential to use language effectively as there exists a very large combination of plausible objects, attributes, and actions in the world. We posit that visual recognition models should be [...]

Making 3D Predictions with 2D Supervision

Abstract: Building computer vision systems that understand 3D shape are important for applications including autonomous vehicles, graphics, and VR / AR. If we assume 3D shape supervision, we can now build systems that do a reasonable job at predicting 3D shapes from images. However, 3D supervision is difficult to obtain at scale; therefore we should [...]

Angjoo Kanazawa
Assistant Professor
University of California

Perceiving 3D Human-Object Spatial Arrangements from a Single Image In-the-wild

Abstract: We live in a 3D world that is dynamic—it is full of life, with inhabitants like people and animals who interact with their environment through moving their bodies. Capturing this complex world in 3D from images has a huge potential for many applications such as compelling mixed reality applications that can interact with people [...]

Pawel Korus
Research Assistant Professor
NYU Center for Cybersecurity

Detection of Photo Manipulation with Media Forensics

Abstract: Rapid progress in machine learning, computer vision and graphics leads to successive democratization of media manipulation capabilities. While convincing photo and video manipulation used to require substantial time and skill, modern editors bring (semi-) automated tools that can be used by everyone. Some of the most recent examples include manipulation of human faces, e.g., [...]

Ce Liu
Staff Research Scientist
Google Research

Advancing the State of the Art of Computer Vision for Billions of Users

Abstract: At Google, advancing the state of the art of computer vision is very impactful as there are billions of users of Google products, many of which require high-quality, artifact-free images. I will share what we learned from successfully launching core computer vision techniques for various Google products, including PhotoScan (Photos), seamless Google Street View [...]

Salzmann Mathieu
Senior Researcher
EPFL & ClearSpace

Learning-based 6D Object Pose Estimation in Real-world Conditions

Abstract: Estimating the 6D pose, i.e., 3D rotation and 3D translation, of objects relative to the camera from a single input image has attracted great interest in the computer vision community. Recent works typically address this task by training a deep network to predict the 6D pose given an image as input. While effective on [...]

Nicholas Carlini
Research Scientist

Deep Learning: (still) Not Robust

Abstract: One of the key limitations of deep learning is its inability to generalize to new domains. This talk studies recent attempts at increasing neural network robustness to both natural and adversarial distribution shifts. Robustness to adversarial examples, inputs crafted specifically to fool machine learning models, are arguably the most difficult type of domain shift. [...]

Zoltán Ádám Milacski
PhD Candidate
ELTE Eötvös Loránd University

End-to-End ‘One Networks’: Learning Regularizers for Least Squares via Deep Neural Networks

Abstract: Linear Restoration Problems (or Linear Inverse Problems) involve reconstructing images or videos from noisy measurement vectors. Notable examples include denoising, inpainting, super-resolution, compressive sensing, deblurring and frame prediction. Often, multiple such tasks should be solved simultaneously, e.g., through Regularized Least Squares, where each individual problem is underdetermined (overcomplete) with infinitely many solutions from which [...]

Sheng-Yu Wang
PhD Student

Detecting Image Synthesis — Shallow and Deep

Abstract: The proliferation of synthetic media are subject to malicious usages such as disinformation campaigns, posing potential threats to media integrity and democracy. A way to combat this is developing forensics algorithms to identify manipulated media. In the beginning of the talk, I will discuss how one can train a model to detect photos manipulated [...]

Sarah Aboutalib
Former Postdoctoral Scholar
University of Pittsburgh

Deep Learning to Distinguish Recalled but Benign Mammography Images in Breast Cancer Screening

Abstract: Breast cancer screening using the standard mammography exam currently exhibits a high false recall rate (11.6% for women in the U.S.). Only a low proportion (0.5%) of women who were recalled for additional workup were actually found to have breast cancer. As a result of the unnecessary stress and follow-up work from these false [...]

Noah Snavely
Associate Professor
Cornell University and Google Research

The Plenoptic Camera

Abstract: Imagine a futuristic version of Google Street View that could dial up any possible place in the world, at any possible time. Effectively, such a service would be a recording of the plenoptic function—the hypothetical function described by Adelson and Bergen that captures all light rays passing through space at all times. While the plenoptic function [...]

Ricardo Martin-Brualla

Photorealistic Reconstruction of Landmarks and People using Implicit Scene Representation

Abstract: Reconstructing scenes to synthesize novel views is a long standing problem in Computer Vision and Graphics. Recently, implicit scene representations have shown novel view synthesis results of unprecedented quality, like the ones of Neural Radiance Fields (NeRF), which use the weights of a multi-layer perceptron to model the volumetric density and color of a [...]

Guoliang Kang
Postdoctoral Research Associate

Towards Discriminative and Domain-Invariant Feature Learning

Abstract: Deep neural networks have achieved great success in various visual applications, when trained with large amounts of labeled in-domain data. However, the networks usually suffer from a heavy performance drop on the data whose distribution is quite different from the training one. Domain adaptation methods aim to deal with such performance gap caused by [...]

Zhiqiang Shen
Postdoctoral Researcher
Department of Electrical & Computer Engineering, CMU

Learning Efficient Visual Representation on Model, Data, Label and Beyond

Abstract: Efficient deep learning is a broad concept that we aim to learn compressed deep models and develop training algorithms to improve the efficiency of model representations, data and label utilization, etc. In recent years, deep neural networks have been recognized as one of the most effective techniques for many learning tasks, also, in the [...]

Yannis Kalantidis
Research Scientist

Self-supervised Learning and Generalization

Abstract: Contrastive self-supervised learning is a highly effective way of learning representations that are useful for, i.e. generalise, to a wide range of downstream vision tasks and datasets. In the first part of the talk, I will present MoCHi, our recently published contrastive self-supervised learning approach (NeurIPS 2020) that is able to learn transferable representations [...]

Bharath Hariharan
Assistant Professor
Cornell University

Learning to see from few labels

Abstract: Computer vision systems today exhibit a rich and accurate understanding of the visual world, but increasingly rely on learning on large labeled datasets to do so. This reliance on large labeled datasets is a problem especially when one considers difficult perception tasks, or novel domains where annotations might require effort or expertise. We thus [...]

Adriana Romero-Soriano
Research Scientist
Facebook AI Research

Seeing the unseen: inferring unobserved information from multi-modal data

Abstract: As humans we can never fully observe the world around us and yet we are able to build remarkably useful models of it from our limited sensory data. Machine learning problems are often required to operate in a similar setup, that is the one of inferring unobserved information from the observed one. Partial observations [...]

Sanja Fidler
Associate Professor
Department of Computer Science, University of Toronto

Towards AI for 3D Content Creation

Abstract: 3D content is key in several domains such as architecture, film, gaming, and robotics. However, creating 3D content can be very time consuming -- the artists need to sculpt high quality 3d assets, compose them into large worlds, and bring these worlds to life by writing behaviour models that "drives" the characters around in [...]

Farah Deeba
PhD Candidate
Electrical and Computer Engineering Department , University of British Columbia

Understanding the Placenta: Towards an Objective Pregnancy Screening

Abstract: My research focusses on the development of a pregnancy screening tool, that will be: (i) system and user-independent; and (ii) provides a quantifi able measure of placental health. With this end, I am working towards the design of a multiparametric quantitative ultrasound (QUS) based placental tissue characterization method. The method would potentially identify the [...]

Jiachen Li
Ph.D. Candidate
University of California, Berkeley

Relational Reasoning for Multi-Agent Systems

Abstract: Multi-agent interacting systems are prevalent in the world, from purely physical systems to complicated social dynamics systems. The interactions between entities / components can give rise to very complex behavior patterns at the level of both individuals and the whole system. In many real-world multi-agent interacting systems (e.g., traffic participants, mobile robots, sports players), [...]

Hamed Pirsiavash
Assistant Professor
University of Maryland Baltimore County

Self-supervised learning for visual recognition

Abstract: We are interested in learning visual representations that are discriminative for semantic image understanding tasks such as object classification, detection, and segmentation in images/videos. A common approach to obtain such features is to use supervised learning. However, this requires manual annotation of images, which is costly, ambiguous, and prone to errors. In contrast, self-supervised [...]

Ronghang Hu
Research Scientist
Facebook Inc.

Reasoning over Text in Images for VQA and Captioning

Abstract: Text in images carries essential information for multimodal reasoning, such as VQA or image captioning. To enable machines to perceive and understand scene text and reason jointly with other modalities, 1) we collect the TextCaps dataset, which requires models to read and reason over text and visual content in the image to generate image [...]

Jhony Kaesemodel Pontes
Research Scientist
Argo AI

Point Cloud Registration with or without Learning

Abstract: I will be presenting two of our recent works on 3D point cloud registration:   A scene flow method for non-rigid registration: I will discuss our current method to recover scene flow from point clouds. Scene flow is the three-dimensional (3D) motion field of a scene, and it provides information about the spatial arrangement [...]

Arsalan Mousavian
Senior Robotics Research Scientist

Propelling Robot Manipulation of Unknown Objects using Learned Object Centric Models

Abstract: There is a growing interest in using data-driven methods to scale up manipulation capabilities of robots for handling a large variety of objects. Many of these methods are oblivious to the notion of objects and they learn monolithic policies from the whole scene in image space. As a result, they don’t generalize well to [...]

Phillip Isola
Assistant Professor

When and Why Does Contrastive Learning Work?

Abstract: Contrastive learning organizes data by pulling together related items and pushing apart everything else. These methods have become very popular but it's still not entirely clear when and why they work. I will share two ideas from our recent work. First, I will argue that contrastive learning is really about learning to forget. Different [...]

Ehsan Adeli
Clinical Assistant Professor
Stanford University

Anticipating the Future: forecasting the dynamics in multiple levels of abstraction

Abstract: A key navigational capability for autonomous agents is to predict the future locations, actions, and behaviors of other agents in the environment. This is particularly crucial for safety in the realm of autonomous vehicles and robots. However, many current approaches to navigation and control assume perfect perception and knowledge of the environment, even though [...]

Xiaolong Wang
Assistant Professor

Learning to Perceive Videos for Embodiment

Abstract: Video understanding has achieved tremendous success in computer vision tasks, such as action recognition, visual tracking, and visual representation learning. Recently, this success has gradually been converted into facilitating robots and embodied agents to interact with the environments. In this talk, I am going to introduce our recent efforts on extracting self-supervisory signals and [...]

Xavier Giro Nieto
Associate Professor
Universitat Politecnica de Catalunya

Open Challenges in Sign Language Translation & Production

Abstract: Machine translation and computer vision have greatly benefited of the advances in deep learning. The large and diverse amount of textual and visual data have been used to train neural networks whether in a supervised or self-supervised manner. Nevertheless, the convergence of the two field in sign language translation and production is still poses [...]

Ishan Misra
Research Scientist
Facebook AI Research

3D Recognition with self-supervised learning and generic architectures

Abstract: Supervised learning relies on manual labeling which scales poorly with the number of tasks and data. Manual labeling is especially cumbersome for 3D recognition tasks such as detection and segmentation and thus most 3D datasets are surprisingly small compared to image or video datasets. 3D recognition methods are also fragmented based on the type [...]

Deepak Pathak
Assistant Professor
Carnegie Mellon University

Rapid Adaptation for Robot Learning

Abstract: How can we train a robot to generalize to diverse environments? This question underscores the holy grail of robot learning research because it is difficult to supervise an agent for all possible situations it can encounter in the future. We posit that the only way to guarantee such a generalization is to continually learn and [...]

Iasonas Kokkinos
Research Manager
Snap Inc, UCL

Humans, hands, and horses: 3D reconstruction of articulated object categories using strong, weak, and self-supervision

Abstract: Reconstructing 3D objects from a single 2D image is a task that humans perform effortlessly,  yet computer vision so far has only robustly solved 3D face reconstruction. In this talk we will see how we can extend the scope of monocular 3D reconstruction to more challenging, articulated categories such as human bodies, hands and [...]

Alex Schwing
Assistant Professor
University of Illinois

Looking behind the Seen in Order to Anticipate

Abstract: Despite significant recent progress in computer vision and machine learning, personalized autonomous agents often still don’t participate robustly and safely across tasks in our environment. We think this is largely because they lack an ability to anticipate, which in turn is due to a missing understanding about what is happening behind the seen, i.e., [...]

Serena Yeung
Assistant Professor
Stanford University

The Clinician’s AI Partner: Augmenting Clinician Capabilities Across the Spectrum of Healthcare

Abstract: Clinicians often work under highly demanding conditions to deliver complex care to patients. As our aging population grows and care becomes increasingly complex, physicians and nurses are now also experiencing feelings of burnout at unprecedented levels. In this talk, I will discuss possibilities for computer vision to function as a partner to clinicians, and to augment their capabilities, across [...]

Judy Hoffman
Assistant Professor
College of Computing, Georgia Tech

Reliable and Accessible Visual Recognition

Abstract: As visual recognition models are developed across diverse applications; we need the ability to reliably deploy our systems in a variety of environments. At the same time, visual models tend to be trained and evaluated on a static set of curated and annotated data which only represents a subset of the world. In this [...]

Tadas Baltrusaitis
Principal Scientist
Microsoft, Mixed Reality Cambridge

Fake It Till You Make It: Face analysis in the wild using synthetic data alone

Abstract: In this seminar I will demonstrate how synthetic data alone can be used to perform face-related computer vision in the wild. The community has long enjoyed the benefits of synthesizing training data with graphics, but the domain gap between real and synthetic data has remained a problem, especially for human faces. Researchers have tried [...]

Or Patashnik
Graduate Student
School of Computer Science at Tel-Aviv University

Leveraging StyleGAN for Image Editing and Manipulation

Abstract: StyleGAN has recently been established as the state-of-the-art unconditional generator, synthesizing images of phenomenal realism and fidelity, particularly for human faces. With its rich semantic space, many works have attempted to understand and control StyleGAN’s latent representations with the goal of performing image manipulations. To perform manipulations on real images, however, one must learn to [...]

Soumyadip Sengupta
Postdoctoral Research Associate
University of Washington

Next-Gen Video Communication

Abstract: Video communication connects our world. It is necessary in conducting business, educational and personal activities across different geographical locations. However, the quality of an average user’s video communication is dramatically worse than that of professionally created videos in news broadcasts, talk shows, and on YouTube. This is because professionally created videos are often captured with [...]

Robert Collins
Associate Professor
Penn State University

Activity Understanding of Scripted Performances

Abstract: The PSU Taichi for Smart Health project has been doing a deep-dive into vision-based analysis of 24-form Yang-style Taichi (TaijiQuan). A key property of Taichi, shared by martial arts katas and prearranged form exercises in other sports, is practice of a scripted routine to build both mental and physical competence.  The scripted nature of routines [...]

Vishal Patel
Associate Professor
Johns Hopkins University

Domain adaptive object detection

Abstract: Recent advances in deep learning have led to the development of accurate and efficient models for object detection. However, learning highly accurate models relies on the availability of large-scale annotated datasets. Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images.  Domain adaptation tries to mitigate this degradation.  In [...]

Umberto Michieli
Postdoctoral Researcher and Adjunct Professor
University of Padua

Visual Understanding across Semantic Groups, Domains and Devices

Abstract: Deep neural networks often lack generalization capabilities to accommodate changes in the input/output domain distributions and, therefore, are inherently limited by the restricted visual and semantic information contained in the original training set. In this talk, we argue the importance of the versatility of deep neural architectures and we explore it from various perspectives.   [...]

Chao Chen
Assistant Professor
Stony Brook University

Topology-Driven Learning for Biomedical Imaging Informatics

Abstract: Thanks to decades of technology development, we are now able to visualize in high quality complex biomedical structures such as neurons, vessels, trabeculae and breast tissues. We need innovative methods to fully exploit these structures, which encode important information about underlying biological mechanisms. In this talk, we explain how topology, i.e., connected components, handles, loops, [...]

Gianfranco Doretto
Associate Professor
West Virginia University

Learning generative representations for image distributions

Abstract: Autoencoder neural networks are an unsupervised technique for learning representations, which have been used effectively in many data domains. While capable of generating data, autoencoders have been inferior to other models like Generative Adversarial Networks (GAN’s) in their ability to generate image data. We will describe a general autoencoder architecture that addresses this limitation, and [...]

Daniel McDuff
Principal Researcher
Microsoft Research

Building Intelligent and Visceral Machines: From Sensing to Application

Abstract: Humans have evolved to have highly adaptive behaviors that help us survive and thrive. As AI prompts a move from computing interfaces that are explicit and procedural to those that are implicit and intelligent, we are presented with extraordinary opportunities. In this talk, I will argue that understanding affective and behavioral signals presents many opportunities [...]

Arun Mallya
Senior Research Scientist

GANcraft – an unsupervised 3D neural method for world-to-world translation

Abstract: Advances in 2D image-to-image translation methods, such as SPADE/GauGAN, have enabled users to paint photorealistic images by drawing simple sketches similar to those created in Microsoft Paint. Despite these innovations, creating a realistic 3D scene remains a painstaking task, out of the reach of most people. It requires years of expertise, professional software, a library [...]

Deqing Sun
Senior Research Scientist

Learning Optical Flow: Model, Data, and Applications

Abstract: Optical flow provides important information about the dynamic world and is of fundamental importance to many tasks. In this talk, I will present my work on different aspects of learning optical flow. I will start with the background and talk about PWC-Net, a compact and effective model built using classical principles for optical flow. Next, [...]

Chen Sun
Assistant Professor, Computer Science
Brown University

Do Vision-Language Pretrained Models Learn Spatiotemporal Primitive Concepts?

Abstract:  Vision-language models pretrained on web-scale data have revolutionized deep learning in the last few years. They have demonstrated strong transfer learning performance on a wide range of tasks, even under the "zero-shot" setup, where text "prompts" serve as a natural interface for humans to specify a task, as opposed to collecting labeled data. These models are [...]

Dr. Randall Balestriero
Post-Doctorate Researcher
Meta AI

Max-Affine Spline Insights into Deep Learning

Abstract:  We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs) that provide a powerful portal through which we view and analyze their inner workings. For instance, [...]

David Fouhey
Assistant Professor
EECS Department , University of Michigan

Understanding 3D Scenes and Interacting Hands

Abstract:  Abstract: The long-term goal of my research is to help computers understand the physical world from images, including both 3D properties and how humans or robots could interact with things. This talk will summarize two recent directions aimed at enabling this goal.   I will begin with learning to reconstruct full 3D scenes, including [...]

Boyi Li
Research Scientist
NVIDIA Research and Visiting Scholar at UC Berkeley

Multimodal Modeling: Learning Beyond Visual Knowledge

Newell-Simon Hall 3305

Abstract:  The computer vision community has embraced the success of learning specialist models by training with a fixed set of predetermined object categories, such as ImageNet or COCO. However, learning only from visual knowledge might hinder the flexibility and generality of visual models, which requires additional labeled data to specify any other visual concept and [...]

Alexander Richard
Research Scientist
Reality Labs Research

Audio-Visual Learning for Social Telepresence

Newell-Simon Hall 3305

Abstract Relationships between people are strongly influenced by distance. Even with today’s technology, remote communication is limited to a two-dimensional audio-visual experience and lacks the availability of a shared, three-dimensional space in which people can interact with each other over the distance. Our mission at Reality Labs Research (RLR) in Pittsburgh is to develop such [...]

Postdoctoral Fellow
Robotics Institute,
Carnegie Mellon University

Representations in Robot Manipulation: Learning to Manipulate Ropes, Fabrics, Bags, and Liquids

3305 Newell-Simon Hall

Abstract: The robotics community has seen significant progress in applying machine learning for robot manipulation. However, much manipulation research focuses on rigid objects instead of highly deformable objects such as ropes, fabrics, bags, and liquids, which pose challenges due to their complex configuration spaces, dynamics, and self-occlusions. To achieve greater progress in robot manipulation of [...]

Jean-François Lalonde
Université Lava

Towards editable indoor lighting estimation

Newell-Simon Hall 3305

Abstract:  Combining virtual and real visual elements into a single, realistic image requires the accurate estimation of the lighting conditions of the real scene. In recent years, several approaches of increasing complexity---ranging from simple encoder-decoder architecture to more sophisticated volumetric neural rendering---have been proposed. While the quality of automatic estimates has increased, they have the unfortunate downside [...]

Project Scientist
Robotics Institute,
Carnegie Mellon University

Computational imaging with multiply scattered photons

Newell-Simon Hall 3305

Abstract:  Computational imaging has advanced to a point where the next significant milestone is to image in the presence of multiply-scattered light. Though traditionally treated as noise, multiply-scattered light carries information that can enable previously impossible imaging capabilities, such as imaging around corners and deep inside tissue. The combinatorial complexity of multiply-scattered light transport makes [...]

Wei-Chiu Ma
PhD Candidate

Mental models for 3D modeling and generation

Newell-Simon Hall 3305

Abstract:  Humans have extraordinary capabilities of comprehending and reasoning about our 3D visual world. One particular reason is that when looking at an object or a scene, not only can we see the visible surface, but we can also hallucinate the invisible parts - the amodal structure, appearance, affordance, etc. We have accumulated thousands of [...]

Michael Zollhoefer
Research Scientist
Reality Labs Research

Complete Codec Telepresence

Newell-Simon Hall 3305

Abstract:  Imagine two people, each of them within their own home, being able to communicate and interact virtually with each other as if they are both present in the same shared physical space. Enabling such an experience, i.e., building a telepresence system that is indistinguishable from reality, is one of the goals of Reality Labs [...]

Kayvon Fatahalian
Associate Professor of Computer Science
Stanford University

R.I.P ohyay: experiences building online virtual experiences during the pandemic: what works, what hasn’t, and what we need in the future

Newell-Simon Hall 3305

Abstract:  During the pandemic I helped design ohyay (, a creative tool for making and hosting highly customized video-based virtual events. Since Fall 2020 I have personally designed many online events: ranging from classroom activities (lectures, small group work, poster sessions, technical papers PC meetings), to conferences, to virtual offices, to holiday parties involving 100's [...]

Fabio Pizzati
PhD student

Physics-informed image translation

Abstract:  Generative Adversarial Networks (GANs) have shown remarkable performances in image translation, being able to map source input images to target domains (e.g. from male to female, day to night, etc.). However, their performances may be limited by insufficient supervision, which may be challenging to obtain. In this talk, I will present our recent works [...]

Adriana Kovashka
Associate Professor in Computer Science
University of Pittsburgh

Weak Multi-modal Supervision for Object Detection and Persuasive Media

Newell-Simon Hall 3305

Abstract:  The diversity of visual content available on the web presents new challenges and opportunities for computer vision models. In this talk, I present our work on learning object detection models from potentially noisy multi-modal data, retrieving complementary content across modalities, transferring reasoning models across dataset boundaries, and recognizing objects in non-photorealistic media.  While the [...]

Andrew Owens
Assistant Professor
Electrical Engineering & Computer Science , University of Michigan

Learning Visual, Audio, and Cross-Modal Correspondences

Newell-Simon Hall 3305

Abstract:  Today's machine perception systems rely heavily on supervision provided by humans, such as labels and natural language. I will talk about our efforts to make systems that, instead, learn from two ubiquitous sources of unlabeled data: visual motion and cross-modal sensory associations. I will begin by discussing our work on creating unified models for [...]

Lachlan MacDonald
Australian Institute for Machine Learning, University of Adelaide

Towards a formal theory of deep optimisation

Newell-Simon Hall 3305

Abstract:  Precise understanding of the training of deep neural networks is largely restricted to architectures such as MLPs and cost functions such as the square cost, which is insufficient to cover many practical settings.  In this talk, I will argue for the necessity of a formal theory of deep optimisation.  I will describe such a [...]

Christoph Lassner
Senior Research Scientist
Epic Games

Towards Interactive Radiance Fields

Newell-Simon Hall 3305

Abstract:  Over the last years, the fields of computer vision and computer graphics have increasingly converged. Using the exact same processes to model appearance during 3D reconstruction and rendering has shown tremendous benefits, especially when combined with machine learning techniques to model otherwise hard-to-capture or -simulate optical effects. In this talk, I will give an [...]

Rika Antonova
Postdoctoral Scholar
Stanford University

Enabling Self-sufficient Robot Learning

3305 Newell-Simon Hall

Abstract:  Autonomous exploration and data-efficient learning are important ingredients for helping machine learning handle the complexity and variety of real-world interactions. In this talk, I will describe methods that provide these ingredients and serve as building blocks for enabling self-sufficient robot learning. First, I will outline a family of methods that facilitate active global exploration. [...]

Vasudevan (Vasu) Sundarababu
SVP & Head of Digital Engineering

How Computer Vision Helps – from Research to Scale

3305 Newell-Simon Hall

Abstract:  Vasudevan (Vasu) Sundarababu, SVP and Head of Digital Engineering, will cover the topic: ‘How Computer Vision Helps – from Research to Scale’. During his time, Vasu will explore how Computer Vision technology can be leveraged in-market today, the key projects he is currently leading that leverage CV, and the end-to-end lifecycle of a CV initiative - [...]

Rachel McDonnell
Associate Professor
Creative Technologies, Trinity College Dublin, Ireland

Motion Matters in the Metaverse

3305 Newell-Simon Hall

Abstract:  Abstract: In the early 1970s, Psychologists investigated biological motion perception by attaching point-lights to the joints of the human body, known as ‘point light walkers’. These early experiments showed biological motion perception to be an extreme example of sophisticated pattern analysis in the brain, capable of easily differentiating human motions with reduced motion cues. Further [...]

Anand Bhattad
PhD candidate
University of Illinois Urbana-Champaign

What do generative models know about geometry and illumination?

3305 Newell-Simon Hall

Abstract: Generative models can produce compelling pictures of realistic scenes. Objects are in sensible places, surfaces have rich textures, illumination effects appear accurate, and the models are controllable. These models, such as StyleGAN, can also generate semantically meaningful edits of scenes by modifying internal parameters. But do these models manipulate a purely abstract representation of the [...]

Saurabh Gupta
Assistant Professor
University of Illinois Urbana-Champaign

Robot Learning by Understanding Egocentric Videos

GHC 8102

Abstract: True gains of machine learning in AI sub-fields such as computer vision and natural language processing have come about from the use of large-scale diverse datasets for learning. In this talk, I will discuss if and how we can leverage large-scale diverse data in the form of egocentric videos (first-person videos of humans conducting [...]

Angjoo Kanazawa
Assistant Professor of the Department of Electrical Engineering and Computer Science
, University of California at Berkeley

From Videos to 4D Worlds and Beyond

Newell-Simon Hall 3305

Abstract:  Abstract: The world underlying images and videos is 3-dimensional and dynamic, i.e. 4D, with people interacting with each other, objects, and the underlying scene. Even in videos of a static scene, there is always the camera moving about in the 4D world. Accurately recovering this information is essential for building systems that can reason [...]

Sergey Tulyakov
Principal Research Scientist
Snap Inc.

Generative and Animatable Radiance Fields

Newell-Simon Hall 3305

Abstract:  Generating and transforming content requires both creativity and skill. Creativity defines what is being created and why, while skill answers the question of how. While creativity is believed to be abundant, skill can often be a barrier to creativity. In our team, we aim to substantially reduce this barrier. Recent Generative AI methods have simplified the problem for 2D [...]

Miguel Angel Bautista
Staff Research Scientist
Apple Machine Learning Research

Generative modeling: from 3D scenes to fields and manifold

Newell-Simon Hall 3305

Abstract: In this keynote talk, we delve into some of our progress on generative models that are able to capture the distribution of intricate and realistic 3D scenes and fields. We explore a formulation of generative modeling that optimizes latent representations for disentangling radiance fields and camera poses, enabling both unconditional and conditional generation of 3D [...]

Shervin Ardeshir
Senior Research Scientist

Estimating Robustness using Proxies

Newell-Simon Hall 3305

ABSTRACT: This talk covers some of our recent explorations on estimating the robustness of black-box machine learning models across data subpopulations. In other words, if a trained model is uniformly accurate across different types of inputs, or if there are significant performance disparities affecting the different subpopulations. Measuring such a characteristic is fairly straightforward if [...]

Or Patashnik
PhD student
Tel-Aviv University

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Newell-Simon Hall 3305

Abstract: In this talk, I will focus on presenting my recent work which will be presented at CVPR in less than two months. Text-guided image generation has progressed rapidly in recent years, inspiring major breakthroughs in text-guided shape generation. Recently, it has been shown that using score distillation, one can successfully text-guide a NeRF model to [...]

Navigating to Objects in the Real World

3305 Newell-Simon Hall

Abstract: Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the classical pipeline for spatial navigation, which builds a geometric map using depth sensors and plans to reach point goals. Broadly, end-to-end [...]

Vineeth N Balasubramanian
Associate Professor
Department of Computer Science and Engineering, Indian Institute of Technology, Hyderabad

Going Beyond Continual Learning: Towards Organic Lifelong Learning

3305 Newell-Simon Hall

Abstract: Supervised learning, the harbinger of machine learning over the last decade, has had tremendous impact across application domains in recent years. However, the notion of a static trained machine learning model is becoming increasingly limiting, as these models are deployed in changing and evolving environments. Among a few related settings, continual learning has gained significant [...]

Santhosh Kumar Ramakrishnan
Ph.D. Candidate
University of Texas at Austin

Predictive Scene Representations for Embodied Visual Search

GHC 6501

Abstract:  My research advances embodied AI by developing large-scale datasets and state-of-the-art algorithms. In my talk, I will specifically focus on the embodied visual search problem, which aims to enable intelligent search for robots and augmented reality (AR) assistants. Embodied visual search manifests as the visual navigation problem in robotics, where a mobile agent must efficiently navigate [...]

Aayush Bansal

Generating Beautiful Pixels

Newell-Simon Hall 3305

Abstract: In this talk, I will present three experiments that use low-level image statistics to generate high-resolution detailed outputs. In the first experiment, I will use 2D pixels to efficiently mine hard examples for better learning. Simply biasing ray sampling towards hard ray examples enables learning of neural fields with more accurate high-frequency detail in less [...]

Viraj Prabhu
CS PhD Student
Georgia Institute of Technology

Towards Reliable Computer Vision Systems

Newell-Simon Hall 3305

Abstract:  The real world has infinite visual variation – across viewpoints, time, space, and curation. As deep visual models become ubiquitous in high-stakes applications, their ability to generalize across such variation becomes increasingly important. In this talk, I will present opportunities to improve such generalization at different stages of the ML lifecycle: first, I will [...]

Bharath Hariharan
Assistant Professor
Cornell University

Vision without labels

3305 Newell-Simon Hall

Abstract: Deep learning has revolutionized all aspects of computer vision, but its successes have come from supervised learning at scale: large models trained on ever larger labeled datasets. However this reliance on labels makes these systems fragile when it comes to new scenarios or new tasks where labels are unavailable. This is in stark contrast to [...]

Yong Jae Lee
Associate Professor
Department of Computer Sciences , University of Wisconsin-Madison

Large Multimodal (Vision-Language) Models for Image Generation and Understanding

Newell-Simon Hall 3305

Abstract: Large Language Models and Large Vision Models, also known as Foundation Models, have led to unprecedented advances in language understanding, visual understanding, and AI. In particular, many computer vision problems including image classification, object detection, and image generation have benefited from the capabilities of such models trained on internet-scale text and visual data. In [...]

Mohamed Elhoseiny
Assistant Professor
Computer Science, KAUST

Imaginative Vision Language Models: Towards human-level imaginative AI skills transforming species discovery, content creation, self-driving cars, and emotional health

3305 Newell-Simon Hall

Abstract:   Most existing AI learning methods can be categorized into supervised, semi-supervised, and unsupervised methods. These approaches rely on defining empirical risks or losses on the provided labeled and/or unlabeled data. Beyond extracting learning signals from labeled/unlabeled training data, we will reflect in this talk on a class of methods that can learn beyond the vocabulary [...]

Kenneth Marino
Research Scientist
Google DeepMind

World Knowledge in the Time of Large Models

Newell-Simon Hall 3305

Abstract:  This talk will discuss the massive shift that has come about in the vision and ML community as a result of the large pre-trained language and language and vision models such as Flamingo, GPT-4, and other models. We begin by looking at the work on knowledge-based systems in CV and robotics before the large model [...]

Shunsuke Saito
Research Scientist
Meta Reality Labs Research

Digital Human Modeling with Light

Newell-Simon Hall 3305

Abstract: Leveraging light in various ways, we can observe and model physical phenomena or states which may not be possible to observe otherwise. In this talk, I will introduce our recent exploration on digital human modeling with different types of light. First, I will present our recent work on the modeling of relightable human heads, [...]

Jonathon Luiten
Postdoctoral Fellow
RWTH Aachen and Carnegie Mellon University

Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis

Newell-Simon Hall 3305

Abstract: We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model [...]

Arun Ross
Michigan State University

Biometrics in a Deep Learning World

Newell-Simon Hall 3305

Abstract: Biometrics is the science of recognizing individuals based on their physical and behavioral attributes such as fingerprints, face, iris, voice and gait. The past decade has witnessed tremendous progress in this field, including the deployment of biometric solutions in diverse applications such as border security, national ID cards, amusement parks, access control, and smartphones. [...]

Andrea Tagliasacchi
Associate Professor
Simon Fraser University

Neural World Models

Newell-Simon Hall 4305

Abstract: Computer vision researchers have pushed the limits of performance in perception tasks involving natural images to near saturation. With self-supervised inference driven by recent advancements in generative modeling, it can be debated that the era of large image models is coming to a close, ushering in an era focused on video. However, it's worth [...]

Ce Zheng
Ph.D. candidate at Center for Research in Computer Vision
University of Central Florida

Reconstructing 3D Humans from Visual Data

Newell-Simon Hall 3305

Abstract:  Abstract: Understanding humans in visual content is fundamental for numerous computer vision applications. Extensive research has been conducted in the field of human pose estimation (HPE) to accurately locate joints and construct body representations from images and videos. Expanding on HPE, human mesh recovery (HMR) addresses the more complex task of estimating the 3D pose [...]

Zhenglun Kong
Ph.D. in the Department of Electrical and Computer Engineering
Northeastern University

Towards Energy-Efficient Techniques and Applications for Universal AI Implementation

Newell-Simon Hall 3305

Abstract: The rapid advancement of large-scale language and vision models has significantly propelled the AI domain. We now see AI enriching everyday life in numerous ways – from community and shared virtual reality experiences to autonomous vehicles, healthcare innovations, and accessibility technologies, among others. Central to these developments is the real-time implementation of high-quality deep [...]

Shengjie Zhu
Ph.D. Student
Michigan State University

Structure-from-Motion Meets Self-supervised Learning

Newell-Simon Hall 3305

Abstract: How to teach machine to perceive 3D world from unlabeled videos? We will present new solution via incorporating Structure-from-Motion (SfM) into self-supervised model learning. Given RGB inputs, deep models learn to regress depth and correspondence. With the two inputs, we introduce a camera localization algorithm that searches for certified global optimal poses. However, the [...]

Qi Sun
Assistant Professor
New York University

Toward Human-Centered XR: Bridging Cognition and Computation

Newell-Simon Hall 3305

Abstract:   Virtual and Augmented Reality enables unprecedented possibilities for displaying virtual content, sensing physical surroundings, and tracking human behaviors with high fidelity. However, we still haven't created "superhumans" who can outperform what we are in physical reality, nor a "perfect" XR system that delivers infinite battery life or realistic sensation. In this talk, I will discuss some of our [...]

Yanxi Liu
Penn State University

Zeros for Data Science

Newell-Simon Hall 3305

Abstract: The world around us is neither totally regular nor completely random. Our and robots’ reliance on spatiotemporal patterns in daily life cannot be over-stressed, given the fact that most of us can function (perceive, recognize, navigate) effectively in chaotic and previously unseen physical, social and digital worlds. Data science has been promoted and practiced [...]

Agata Lapedriza
Principal Research Scientist/Professor
Northeastern University

Emotion perception: progress, challenges, and use cases

Newell-Simon Hall 3305

Abstract: One of the challenges Human-Centric AI systems face is understanding human behavior and emotions considering the context in which they take place. For example, current computer vision approaches for recognizing human emotions usually focus on facial movements and often ignore the context in which the facial movements take place. In this presentation, I will [...]

Yunzhu Li
Assistant Professor
University of Illinois Urbana-Champaign

Foundation Models for Robotic Manipulation: Opportunities and Challenges

Newell-Simon Hall 3305

Abstract: Foundation models, such as GPT-4 Vision, have marked significant achievements in the fields of natural language and vision, demonstrating exceptional abilities to adapt to new tasks and scenarios. However, physical interaction—such as cooking, cleaning, or caregiving—remains a frontier where foundation models and robotic systems have yet to achieve the desired level of adaptability and [...]

Luca Weihs
Research Manager
Allen Institute for AI

Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World

Newell-Simon Hall 3305

Abstract: We show that imitating shortest-path planners in simulation produces Stretch RE-1 robotic agents that, given language instructions, can proficiently navigate, explore, and manipulate objects in both simulation and in the real world using only RGB sensors (no depth maps or GPS coordinates). This surprising result is enabled by our end-to-end, transformer-based, SPOC architecture, powerful [...]

Vishnu Lokhande
Assistant Professor
University at Buffalo, SUNY

Creating robust deep learning models involves effectively managing nuisance variables

Newell-Simon Hall 3305

Abstract: Over the past decade, we have witnessed significant advances in capabilities of deep neural network models in vision and machine learning. However, issues related to bias, discrimination, and fairness in general, have received a great deal of negative attention (e.g., mistakes in surveillance and animal-human confusion of vision models). But bias in AI models [...]

Mohit Gupta
Associate Professor
University of Wisconsin-Madison

Shedding Light on 3D Cameras

Newell-Simon Hall 3305

Abstract: The advent (and commoditization) of low-cost 3D cameras is revolutionizing many application domains, including robotics, autonomous navigation, human computer interfaces, and recently even consumer devices such as cell-phones. Most modern 3D cameras (e.g., LiDAR) are active; they consist of a light source that emits coded light into the scene, i.e., its intensity is modulated over [...]

Ilya Chugunov
PhD Candidate
Computational Imaging Lab, Princeton University

Neural Field Representations of Mobile Computational Photography

Newell-Simon Hall 3305

Abstract: Burst imaging pipelines allow cellphones to compensate for less-than-ideal optical and sensor hardware by computationally merging multiple lower-quality images into a single high-quality output. The main challenge for these pipelines is compensating for pixel motion, estimating how to align and merge measurements across time while the user's natural hand tremor involuntarily shakes the camera. In [...]

Mian Wei
PhD Candidate
University of Toronto

Passive Ultra-Wideband Single-Photon Imaging

3305 Newell-Simon Hall

Abstract: High-speed light sources, fast cameras, and depth sensors have made it possible to image dynamic phenomena occurring in ever smaller time intervals with the help of actively-controlled light sources and synchronization. Unfortunately, while these techniques do capture ultrafast events, they cannot simultaneously capture slower ones too. I will discuss our recent work on passive ultra-wideband [...]

Angela Dai
Associate Professor
The Technical University Munich

From Understanding to Interacting with the 3D World

1305 Newell Simon Hall

Abstract: Understanding the 3D structure of real-world environments is a fundamental challenge in machine perception, critical for applications spanning robotic navigation, content creation, and mixed reality scenarios. In recent years, machine learning has undergone rapid advancements; however, in the 3D domain, such data-driven learning is often very challenging under limited 3D/4D data availability. In this talk, [...]

Wolfgang Heidrich
Professor of Computer Science and Electrical and Computer Engineering
KAUST Visual Computing Center

Learned Imaging Systems

Newell-Simon Hall 4305

Abstract: Computational imaging systems are based on the joint design of optics and associated image reconstruction algorithms. Of particular interest in recent years has been the development of end-to-end learned “Deep Optics” systems that use differentiable optical simulation in combination with backpropagation to simultaneously learn optical design and deep network post-processing for applications such as hyperspectral [...]

Nataniel Ruiz
Research Scientist

Unlocking Magic: Personalization of Diffusion Models for Novel Applications

3305 Newell-Simon Hall

Abstract: Since the recent advent of text-to-image diffusion models for high-quality realistic image generation, a plethora of creative applications have suddenly become within reach. I will present my work at Google where I have attempted to unlock magical applications by proposing simple techniques that act on these large text-to-image diffusion models. Particularly, a large class of [...]

Yingsi Qin
PhD Candidate
Carnegie Mellon University

Instant Visual 3D Worlds Through Split-Lohmann Displays

3305 Newell-Simon Hall

Abstract: Split-Lohmann displays provide a novel approach to creating instant visual 3D worlds that support realistic eye accommodation. Unlike commercially available VR headsets that show content at a fixed depth, the proposed display can optically place each pixel region to a different depth, instantly creating eye-tracking-free 3D worlds without using time-multiplexing. This enables real-time streaming [...]

Edward Lu
PhD student
ECE Department at CMU

Remote Rendering and 3D Streaming for Resource-Constrained XR Devices

3305 Newell-Simon Hall

Abstract: An overview of the motivation and challenges for remote rendering and real-time 3D video streaming on XR headsets. Bio: Edward is a third year PhD student in the ECE department interested in computer systems for VR/AR devices. Homepage:   Sponsored in part by:   Meta Reality Labs Pittsburgh      

Mosam Dabhi
PhD Student
Carnegie Mellon University

Vectorizing Raster Signals for Spatial Intelligence

3305 Newell-Simon Hall

Abstract: This seminar will focus on how vectorized representations can be generated from raster signals to enhance spatial intelligence. I will discuss the core methodology behind this transformation, with a focus on applications in AR/VR and robotics. The seminar will also briefly cover follow-up work that explores rigging and re-animating objects from casual single videos [...]

Bailey Miller
PhD Candidate
Carnegie Mellon University

Stochastic Graphics Primitives

3305 Newell-Simon Hall

Abstract: For decades computer graphics has successfully leveraged stochasticity to enable both expressive volumetric representations of participating media like clouds and efficient Monte Carlo rendering of large scale, complex scenes. In this talk, we’ll explore how these complementary forms of stochasticity (representational and algorithmic) may be applied more generally across computer graphics and vision. In [...]

Noah Snavely
Professor & Research Scientist
Cornell Tech & Google DeepMind

Reconstructing Everything

3305 Newell-Simon Hall

Abstract: The presentation will be about a long-running, perhaps quixotic effort to reconstruct all of the world's structures in 3D from Internet photos, why this is challenging, and why this effort might be useful in the era of generative AI.   Bio: Noah Snavely is a Professor in the Computer Science Department at Cornell University [...]

Christian Richardt
Research Scientist Lead
Meta Reality Labs Research

High-Fidelity Neural Radiance Fields

3305 Newell-Simon Hall

Abstract: I will present three recent projects that focus on high-fidelity neural radiance fields for walkable VR spaces: VR-NeRF (SIGGRAPH Asia 2023) is an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields. To this end, we designed and built a custom multi-camera rig to [...]