Mining heterogeneous clinical notes by multi-modal latent topic model


Latent knowledge can be extracted from electronic notes recorded during patient encounters with the health system. Using these clinical notes to decipher a patient’s underlying comorbidites, symptom burden, and treatment course is an ongoing challenge. Latent topic model as an efficient Bayesian method can be used to model each patient’s clinical notes as “documents” and the words in the notes as “tokens”. However, standard latent topic models assume that all of the notes follow the same topic distribution, regardless of the type of note or the domain expertise of the author. We propose a novel application of latent topic modeling, using multi-modal topic model to jointly infer distinct topic distributions of notes of different types. We applied our model to clinical notes from the MIMIC-III dataset to infer distinct topic distributions over the physician and nursing note types. We observed a consistent improvement in topic interpretability using multi-modal modeling over the baseline model that ignores the note types. By correlating the patients’ topic mixture with hospital mortality and prolonged mechanical ventilation, we identified several diagnostic topics that are associated with poor outcomes. Because of its elegant and intuitive formation, we envision a broad application of our approach in mining multi-modality text-based healthcare information that go beyond clinical notes.

Plos ONE