Carnegie Mellon University
Abstract:
Multi-view machine learning has garnered substantial attention in various applications over recent years. Many such applications involve learning on data obtained from multiple heterogeneous sources of information, for example, in multi-sensor systems such as self-driving cars, or monitoring intensive care patient vital signs at their bed-side. Learning models for such applications can often benefit from leveraging not only the information from individual sources, but also the interactions and relationships between these sources.
In our research, we look at multi-view learning approaches which try to model these inter-view interactions explicitly. Here, we define interactions and relationships between views in terms of the information which is shared across them, including corroboration and redundancy of information. We distinguish between global relationships, which are shared across all views, and local relationships, which are only shared between a subset. For example, in a multi-camera system, we can think of global relationships to be defined over the part of a scene which is visible to all cameras, while local relationships would be defined by the intersection of the fields of view of only some of the cameras.
We consider three main aspects of modeling such relationships. First, we develop and study a framework for discovering and understanding them within multi-view data. We describe different approaches to uncover and model these global and local relationships. We look at simple multi-view extensions of auto-encoders, and then move onto more sophisticated generative models.
Second, we explore the benefits of this understanding of inter-view relationships to solve down-stream modeling tasks, so that they could leverage the multi-view structures if they exist. Here, we adapt our models to tackle different applications, and demonstrate the utility and effectiveness of explicitly modeling these relationships We first look at incorporating the downstream loss function into the representation learning framework to cater to the task-specific problem. We then consider the domain of temporal data, mainly medical data with temporal physiological measurements of patients, as an application of our methods.
Third, we investigate a methodology for improving these relationships directly by facilitating favorable interactions between views. We first look at how one can re-interpret individual views as data points, allowing us to apply traditional machine learning approaches to modeling inter-view relationships. Using this re-interpretation, we look at view-selection where we directly select views which manifest favorable relationships, and propose Scalable Active Search as a candidate for this. Active Search allows us to interactively search for informative views, given an initial set of views and a measure of similarity between them.
Thesis Committee Members:
Artur Dubrawski, Chair
Jeff Schneider
Srinivasa Narasimhan
Junier Oliva, University of North Carolina, Chapel Hill