Inferring global-scale temporal latent topics from news reports to predict public health interventions for COVID-19


The COVID-19 global pandemic has highlighted the importance of non-pharmacological interventions (NPI) for controlling epidemics of emerging infectious diseases. Despite the importance of NPI, their implementation has been monitored in an ad hoc and uncoordinated manner, mainly through the manual efforts of volunteers. Given the absence of systematic NPI tracking, authorities and researchers are limited in their ability to quantify the effectiveness of NPI and guide decisions regarding their use during the progression of a global pandemic. To address this issue, we propose 3-stage machine learning framework called EpiTopics to facilitate the surveillance of NPI by mining the vast amount of unlabelled news reports about these interventions. Building on topic modeling, our method characterizes online government reports and media articles related to COVID-19 as a mixture of latent topics. Our key contribution is the use of transfer-learning to address the limited number of NPI-labelled documents and topic modelling to support interpretation of the results. At stage 1, we trained a modified version of the unsupervised dynamic embedded topic model (DETM) on 1.2 million international news reports related to COVID-19. At stage 2, we used the trained DETM to infer topic mixture from a small set of 2000 NPI-labelled WHO documents as the input features for predicting NPI labels on each document. At stage 3, we supply the inferred country-level temporal topics from the DETM to the pretrained document-level NPI classifier to predict country-level NPIs. We identified 25 interpretable topics, over 4 distinct and coherent COVID-related themes. These topics contributed to significant improvements in predicting the NPIs labelled in the WHO documents and in predicting country-level NPIs. Together, our work lay the machine learning methodological foundation for future research in global-scale surveillance of public health interventions. The EpiTopics code is available at GitHub:

Under review