Abstract:
Causal inference focuses on uncovering cause-effect relationships from data, diverging from conventional machine learning which primarily relies on correlation analysis. By identifying these causal relationships, causal inference improves feature selection for predictive models, leading to predictions that are more accurate, interpretable, and robust. This approach proves especially effective with interventional data, such as randomized control trials (RCTs), where deliberate changes in variables allow for observing their effects.
In this study, we begin by examining if existing tabular datasets contain interventional data, such as natural experiments. Natural experiments occur when events affect individuals or groups differently, akin to the varied impact of the COVID-19 pandemic on different populations. Our findings demonstrate that real-world datasets indeed contain natural experiments, which can be utilized to enhance classification performance through causal inference. We further extend this methodology to investigate lung ultrasound video datasets, aiming to glean additional insights and enhance diagnostic accuracy.
Committee:
John Galeotti
Deva Ramanan
Peter Spirtes
Gokul Swamy