Utility of Potential Misdiagnoses in Predicting Foodborne Outbreaks - Robotics Institute Carnegie Mellon University

Utility of Potential Misdiagnoses in Predicting Foodborne Outbreaks

Lucia Lucia, Artur Dubrawski, and Lujie Chen
Journal Article, Online Journal of Public Health Informatics, Vol. 6, No. 1, pp. 173, April, 2014

Abstract

Objective
To investigate utility of using inpatient and emergency room diagnoses to detect outbreaks of Salmonellosis in humans. To quantify the impact of including in the analysis cases diagnosed with conditions that may have physiological appearance similar to Salmonellosis.

Introduction
Reliable detection and accurate scoping of outbreaks of foodborne illness are the keys to effective mitigation of their impacts. However, relatively small number of persons affected and underreporting, challenge the reliability of surveillance models. In this work, we correlate a record of identified outbreaks and sporadic cases of Salmonellosis in humans retained in PulseNet [1], and diagnosis codes in hospital claims collected in California from 2006 to 2010. We hypothesize that the data support and reliability of detection could be improved by including cases in which Salmonella infection may be confused [2].

Methods
We join the data in a table indexed with dates and locations, containing counts of inpatient and ED patients diagnosed with Salmonellosis and related diseases, also counts of cases involved in outbreaks, aggregated by day (the admission date or the isolation date) and location (the county of hospital locations or the county where the outbreaks occurred). 9.5% of the 66,845 rows in the table involve sporadic cases and identified clusters.

To quantify predictive utility of potential misdiagnoses, Zero-inflated Poisson regression (ZIP) model [3] is trained to predict the number of cases in epidemiological data. Among Salmonellosis (counts in inpatient and ED) and 12 potential misdiagnoses, the best combination of input features is found by exhaustive search to minimize 10 fold cross validation ZIP prediction error. The chosen model is then trained using thusly selected features using all data. Similarly, we train a Random Forest (RF) binary classifier [4] that also includes spatio-temporal predictors (county and month) to discount seasonality and spatial propensity of outbreaks.

Results
We found that 8 diagnoses related to Salmonellosis have non-trivial impact on outbreak predictability (only Celiac is insignificant with p-value>0.05). Their contributory effect is indicated by positive coefficients of ZIP count model and negative coefficients of ZIP zero model, as shown in the table.

Including counts of these diagnoses improves predictability of the occurrence of outbreaks vs. using Salmonellosis diagnoses only. The AUC score of the RF model increases from 57% to 87%. Adding spatio-temporal factors improves the predictability to 91% AUC. The model discovers 71% of actual outbreak cases at 7% false positive rate (FPr) and correctly recalls 4.5 as many outbreak cases at 1% FPr as when using Salmonellosis diagnoses only.

We found 37% of the predictions can be made 1 to 7 days earlier than the recorded isolation date, increasing precision to 89%. This suggests a potential early warning utility. It is also possible to spot outbreaks not revealed in Pulsenet. For instance, 22 out of 35 outbreak predictions in Yolo County are not in PulseNet; 60% of these 22 have at least 40% of nearby counties showing positive predictions or actual cases in Pulsenet in the same periods of time.

Conclusions
Empirically found informative correlation between the counts of hospital patients diagnosed with diseases that may have physiological appearance similar to Salmonellosis, and epidemiologically recorded cases of Salmonellosis. This suggests that tracking these diseases could support accuracy of foodborne illness surveillance. Further study is yet required to verify the actual extent of clinical misdiagnosing, and if there are other factors explaining the apparent correlation.

Notes
This work is supported by the National Science Foundation (awards 0911032, 1320347), and the Singapore National Research Foundation under its International Research Centre @Singapore Funding Initiative and administered by the IDM Programme Office, Media Development Authority.

BibTeX

@article{Lucia-2014-121722,
author = {Lucia Lucia and Artur Dubrawski and Lujie Chen},
title = {Utility of Potential Misdiagnoses in Predicting Foodborne Outbreaks},
journal = {Online Journal of Public Health Informatics},
year = {2014},
month = {April},
volume = {6},
number = {1},
pages = {173},
}