Abstract:
The widespread adoption of time series machine learning (ML) models faces multiple challenges involving data, modeling and evaluation.
Data. Modern ML models depend on copious amounts of cohesive and reliably annotated data for training and evaluation. However, labeled data is not always available and reliable, and can also be dispersed across different locations. We propose systematic solutions to making time series data ML-ready.
Modeling. Most current time series ML models are built, trained and evaluated on individual datasets from a specific application domain. Thus, to build an effective model for a particular application scenario, substantial effort, time, and domain expertise are required to develop a successful task-specific design. We propose to partially address this limitation by developing large pre-trained foundation models for time series, to ease development of useful models across diverse application domains with limited resources, data and labels.
Evaluation. Currently, time series models are commonly evaluated using relatively small, specific and highly tailored benchmarks, which may obfuscate assessment of their performance. We highlight the gaps in evaluation techniques and propose addressing the most important of them through comprehensive, multi-metric assessment.
In summary, this thesis aims to democratize time series artificial intelligence by simplifying and accelerating development of models, while improving their performance in real-world application scenarios facing resource constraints and imperfect data.
Thesis Committee Members:
Artur Dubrawski, Chair
Jean Oh
Barnabas Poczós
Frederic Sala, University of Wisconsin-Madison
Laurent Callot, Amazon