1:30 pm to 3:00 pm
NSH 3305
Title: Leveraging Structure for Generalization and Prediction in Visual System.
Abstract: Our surrounding world is highly structured. Humans have a great capacity of understanding and leveraging those structures to generalize to novel scenarios and to predict the future. The thesis studies how computer vision systems benefit from the similar process — leveraging inherent structures in data to improve generalization and prediction capacity. It focuses on two specific aspects: zero-shot recognition using categorical structures which is explicitly specified by knowledge graphs; video predictions by leveraging the implicit physical structures among entities. Both methods are based on the scalable machine learning framework, graph neural network, to directly learn structures from large-scale data. In zero-shot recognition, we have shown that accuracy improves significantly and is more robust due to external knowledge in the knowledge graph. In video prediction, we have found the long-term prediction is significantly sharper when factoring the structure among entities.
Committee:
Abhinav Gupta (advisor)
Deva Ramanan
Xiaolong Wang