3:00 pm to 12:00 am
Event Location: NSH 3305
Abstract: This thesis proposal considers the problem of training machine learning classifiers in domains where data are very high dimensional and training examples are extremely limited or impossible to collect for all classes of interest. As a case study, we focus on the application of thought recognition, where the objective is to classify a person’s cognitive state from a recorded image of that person’s neural activity. Machine learning and pattern recognition methods have already made a large impact on this field, but most prior work has focused on classification studies with small numbers of classes and moderate amounts of training data. In this thesis, we focus on thought recognition in a limited data setting, where there are few, if any, training examples for the classes we wish to discriminate, and the number of possible classes can be in the thousands.
Despite these constraints, this thesis seeks to demonstrate that it is possible to classify noisy, high dimensional data with extremely few training examples by using spatial and temporal domain knowledge, intelligent feature selection, semantic side information, and large quantities of unlabeled data from related tasks.
In our preliminary work, we showed that it possible that build a binary classifier that can accurately classify between cognitive states with more than 80,000 features, and only two training examples per class. We also showed how classification can be improved using principled feature selection, and derived a significance test using order statistics that is appropriate for very high-dimensional problems with small numbers of training examples.
We have also explored the most extreme case of limited data, the zero-shot learning setting, where we do not have any training examples for classes we wish to discriminate. We showed that by using a knowledge base of semantic side information to create intermediate features, we can build a classifier that can classify words that people are thinking about, even without training data for those words while the classifier is forced to choose between nearly 1,000 different candidate words.
Finally, we showed how multi-task learning can be used to learn useful semantic features directly from data. We formulated the semantic feature learning problem as a Multi-task Lasso and presented an extremely fast and highly scalable algorithm for solving the resulting optimization.
We propose work to extend our zero-shot learning setting by optimizing semantic feature sets and by using an active learning framework to choose the most informative training examples. We also propose to use latent feature models such as components analysis and sparse coding in a self-taught learning framework to improve decoding by leveraging data from additional neural imaging experiments.
Committee:Tom Mitchell, Chair
Dean Pomerleau
J. Andrew Bagnell
Andrew Ng, Stanford University