3:00 pm to 4:00 pm
Event Location: Newell Simon Hall 1507
Bio: Haroon Idrees is a postdoctoral researcher in the Center for Research in Computer Vision (CRCV) at the University of Central Florida (UCF). He is interested in machine vision and learning, with focus on crowd analysis, action recognition, multi-camera and airborne surveillance, as well as deep learning and multimedia content analysis. He chaired the THUMOS challenge on Action Recognition (CVPR, 2015) and has been program committee member of Workshop on Applications for Aerial Video Exploitation (WACV, 2015), Multiple Object Tracking Challenge (ECCV, 2016), and the upcoming BMTT-PETS Workshop on Tracking and Surveillance (CVPR, 2017) and Open Domain Action Recognition (CVPR, 2017). He has published several papers in CVPR, ICCV, ECCV, Journal of Image and Vision Computing, and IEEE Transactions on Pattern Analysis and Machine Intelligence. He received BSc (Honors) degree in computer engineering from the Lahore University of Management Sciences, Pakistan in 2007, and the PhD degree in computer science from the University of Central Florida in 2014.
Abstract: Automated analysis of dense crowds is a challenging problem with far-reaching applications in crowd safety and management, as well as gauging political significance of protests and demonstrations. In this talk, I will first describe a counting approach which uses traditional computer vision techniques, and was recently applied to Catalonia Demonstrations in Spain in 2015 and 2016. An extension of this work using convolutional neural network with hundreds of layers is presented next, partially made possibly through a new dataset for counting with over one million humans – all marked with dot annotations. Next, I will discuss how context in the form of local consistency captures the similarity in scale in local neighborhoods in an image and is used to detect partially visible humans in dense crowds. Finally, for the task of re-identification in a multi-camera setup, spatio-temporal context in the form of personal, social and environmental constraints aid in eliminating incorrect hypotheses and significantly improve performance on correct re-acquisition of people across cameras especially when appearance and visual features alone are insufficient.