Using Comparisons to Reduce Cost of Data Annotation Required to Train Models for Bedside Monitoring - Robotics Institute Carnegie Mellon University

Using Comparisons to Reduce Cost of Data Annotation Required to Train Models for Bedside Monitoring

J. Sheng, L. Chen, Y. Xu, M. R. Pinsky, M. Hravnak, and A. Dubrawski
Journal Article, Critical Care Medicine, Vol. 47, No. 1, pp. 606, 2019

Abstract

Learning Objectives: Labeling training data for robust models to infer instability from vital sign time series is tedious and time consuming, requiring expert clinicians to quantitatively assess large numbers of individual subject cases. We wish to reduce this burden by asking them to qualitatively compare pairs of cases instead: “Which one appears healthier?” Qualitative comparisons should require less time and be easier to make with greater confidence than individual quantitative assessments.

Methods: We manually identified 793 real alerts in non-invasive continuous vital signs data collected from bedside monitors in a 24-bed surgical stepdown unit over 3 months. 2 expert clinicians directly rated each individual alert quantitatively on a scale from 1 (least severe) to 4 (most severe). We also applied an active learning algorithm utilizing both direct labeling and qualitative pairwise comparisons (more/less severe) to provide relative ordering for pairs of cases. We used regularized logistic regression as the base classifier to assess alert severity. Classification accuracies were computed with 10-fold cross validation as functions of data labeling effort to compare performance of the algorithm using a hybrid model (both the qualitative and a few quantitative labels) versus the model using only direct quantitative labels. We define cost ratio (CR) as the ratio of unit cost (that can be measured with time spent) of answering a quantitative labeling query to the unit cost of a qualitative comparison, and investigated the influence of range of CRs from 1 to 10 on model performance.

Results: If the CR is equal or greater than 3, the hybrid model learns faster and requires lesser data annotation effort from clinicians to achieve equivalent performance to that of the direct labeling. Relative reduction in data annotation effort as a function of CR, measured when the classification accuracy of the hybrid model reaches 93% (attainable average with direct labeling) with 95% confidence, grows monotonically as follows: -4% at CR of 2, +16% at 3, +33% at 5 and +41% at 7.

Conclusions: Making qualitative comparisons of apparent severity between case pairs is an easier task than assessing individual severity on a quantitative scale. Reliable health assessment models can be learned while consuming less expert time on data labeling, if comparisons are in fact quicker to make than direct assessments.

Notes
Funding NIH R01NR013912

BibTeX

@article{Sheng-2019-121691,
author = {J. Sheng and L. Chen and Y. Xu and M. R. Pinsky and M. Hravnak and A. Dubrawski},
title = {Using Comparisons to Reduce Cost of Data Annotation Required to Train Models for Bedside Monitoring},
journal = {Critical Care Medicine},
year = {2019},
month = {January},
volume = {47},
number = {1},
pages = {606},
}