Leveraging Structure for Generalization and Prediction in Visual System
Abstract
Our surrounding world is highly structured. Humans have a great capacity of understanding and leveraging those structures to generalize to novel scenarios and to predict the future. The thesis studies how computer vision systems benefit from a similar process -- leveraging inherent structures in data to improve generalization and prediction capacity.
It focuses on two specific aspects: zero-shot recognition using categorical structures which is explicitly specified by knowledge graphs; video predictions by leveraging the implicit physical structures among entities. Both methods are based on the scalable machine learning framework, graph neural network, to directly learn structures from large-scale data. In zero-shot recognition, we have shown that accuracy improves significantly and is more robust due to external knowledge in the knowledge graph. In video prediction, we have found the long-term prediction is significantly sharper when factoring the structure among entities.
This work serves as the master thesis of Yufei Ye.
BibTeX
@mastersthesis{Ye-2019-117140,author = {Yufei Ye},
title = {Leveraging Structure for Generalization and Prediction in Visual System},
year = {2019},
month = {June},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-70},
}