11:00 am to 12:00 pm
Event Location: NSH 1507
Bio: Xiaogang Wang received his Bachelor degree in Electrical Engineering and Information Science from the Special Class of Gifted Young at the University of Science and Technology of China in 2001, M. Phil. degree in Information Engineering from the Chinese University of Hong Kong in 2004, and PhD degree in Computer Science from Massachusetts Institute of Technology in 2009. He is an assistant professor in the Department of Electronic Engineering at the Chinese University of Hong Kong since August 2009. He received the Outstanding Young Researcher in Automatic Human Behaviour Analysis Award in 2011, Hong Kong RGC Early Career Award in 2012, and Young Researcher Award of the Chinese University of Hong Kong. He is the associate editor of the Image and Visual Computing Journal. He was the area chair of ICCV 2011, ECCV 2014 and ACCV 2014. His research interests include computer vision, deep learning, crowd video surveillance, object detection, and face recognition.
Abstract: Deep learning has achieved impressive success on solving grand challenges in many fields including computer vision. In this seminar, I will introduce our recent works on developing deep models to solve several computer vision problems, especially on pedestrian detection and person re-identification across camera views. Instead of treating a deep model as a black box, we investigate the connection between deep models and existing vision systems, such that some valuable insights and experience accumulated from vision research can be used to develop new deep models and effective training strategies. We propose a framework of joint deep learning, in which specific layers are designed corresponding to key components in vision systems. Different from vision systems which manually design or sequentially learn the components on the pipelines, these layers can be pre-trained and jointly optimized with deep learning, and therefore significant improvement can be achieved. For example, in pedestrian detection, new layers are designed to function as part detectors, modeling deformations and inferring the visibility of body parts. In person identification, filter pairs are automatically learned to encode photometric transforms across camera views, and specific layers are designed to jointly handle misalignment, geometric transforms, occlusions and background clutters. At the end of this talk, I will also briefly report our recent deep learning results on other vision problems such as face recognition.