Text Classification with Class Descriptions Only - Robotics Institute Carnegie Mellon University
Loading Events

PhD Speaking Qualifier

November

10
Thu
Mononito Goswami PhD Student Robotics Institute,
Carnegie Mellon University
Thursday, November 10
4:00 pm to 5:00 pm
NSH 1109
Text Classification with Class Descriptions Only

Abstract:
In this work, we introduce KeyClass, a weakly-supervised text classification framework that learns from class-label descriptions only, without the need to use any human-labeled documents. It leverages the linguistic domain knowledge stored within pre-trained language models and data programming to automatically label documents. We demonstrate its efficacy and flexibility by comparing it to state-of-the-art weak text classifiers across four real-world text classification datasets.

Next, we’ll discuss an important clinical application of KeyClass: assigning diagnostic codes to medical notes in the publicly available MIMIC-III database. Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients’ diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work has demonstrated potential utility of Machine Learning in automating this process, but it has relied on large quantities of manually labeled data to train the models. Additionally, diagnostic coding systems evolve with time, which makes traditional supervised learning strategies unable to generalize beyond local applications.

Relevant paper: [2206.12088] Classifying Unstructured Clinical Notes via Automatic Weak Supervision

Committee:
Prof. Artur Dubrawski (Chair)
Prof. Barnabás Póczos
Prof. Srinivasa Narasimhan
Benedikt Boecking