3:00 pm to 4:00 pm
GHC 8102
Abstract:
The fields of Machine Learning and Data Science generally follow the paradigm that “the ends justify the means”, where improving predictive power of an algorithm is considered of paramount value, even when implemented at the expense of model intelligibility. While accuracy is an important performance metric, interpretability should be a major consideration for many application domains. This is particularly true for decision support systems where a human must ultimately take responsibility for their decision based on machine recommendations. Other times, applying the most powerful state-of-the-art learning models to data may not be necessary in order to make confident predictions, and in those cases, interpretability can be maintained by keeping algorithms as simple as possible.
I will be sharing a novel bounding box algorithm which finds easy-to-understand, low-dimensional structure in data. I will then discuss a few use cases where we can leverage these simple structures to provide interpretable answers to potentially complex questions. Finally, I will show that for a given data set, some data are ‘harder’ than others. I will present a staged model framework which provides interpretable predictions for the ‘easy’ data, while allowing the ‘hard’ data to be processed by a more powerful and more complex alternative model. Across a survey of some publicly available data sets, I will show that a significant amount of data can be confidently handled with a simple model, without incurring a statistically distinguishable loss in accuracy compared to a more powerful black-box model.
Committee:
Artur Dubrawski
Barnabás Póczos
Deva Ramanan
Matt Barnes