September 2020 – December 2020

Teaching Assistant

McGill University

COMP 550 Natural Language Processing
September 2019 – Present

Research Assistant

McGill University

Supervised by Prof. Yue Li. Research in machine learning and NLP in healthcare domain.
October 2018 – April 2019

Research Intern

Horizon Robotics

Responsibilities include:

  • Audio data analysis and processing for speech recognition
  • Developing web-based platform for large-scale audio data processing
  • Developing systems for large-scale evaluation of audio recordings’ quality and data selection basing on audio and textual features

July 2018 – October 2018

Research Intern

University of Toronto & SickKids

Responsibilities include:

  • Literature review in DKI, BOLD signal analysis, etc.
  • Developing programs for analyzing BOLD signals for CVR measurement
  • Analyzing experimental data for diagnostic purposes


As the COVID-19 pandemic continues to unfold, understanding the global impact of non-pharmacological interventions (NPI) is important for formulating effective intervention strategies, particularly as many countries prepare for future waves. We used a machine learning approach to distill latent topics related to NPI from large-scale international news media. We hypothesize that these topics are informative about the timing and nature of implemented NPI, dependent on the source of the information (e.g., local news versus official government announcements) and the target countries. Given a set of latent topics associated with NPI (e.g., self-quarantine, social distancing, online education, etc), we assume that countries and media sources have different prior distributions over these topics, which are sampled to generate the news articles. To model the source-specific topic priors, we developed a semi-supervised, multi-source, dynamic, embedded topic model. Our model is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article. To learn these models, we developed an efficient end-to-end amortized variational inference algorithm. We applied our models to news data collected and labelled by the World Health Organization (WHO) and the Global Public Health Intelligence Network (GPHIN). Through comprehensive experiments, we observed superior topic quality and intervention prediction accuracy, compared to the baseline embedded topic models, which ignore information on media source and intervention labels. The inferred latent topics reveal distinct policies and media framing in different countries and media sources, and also characterize reaction to COVID-19 and NPI in a semantically meaningful manner. Our PyTorch code is available on Github (https://github.com/li-lab-mcgill/covid19_media).
ACM-BCB, 2020

EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code as- signment, diagnosis-driven lab tests leading to not-missing-at-random (NMAR) biases, and heterogeneous data types across clinical notes, billing codes, lab tests, and medications. To address these challenges, we present MixEHR, a multi-view Bayesian framework related to collaborative filtering and latent topic models for EHR data integration and modeling.
Nature Communications, 2020

In our ablation study, we reported several issues regarding the model implementation, and examined the effect of latent space dimension and number of Transformer layers on the performance of the original model.
In NeurIPS 2019 Reproducibility Challenge, 2019

This paper proposes a method that utilizes Kalman filter and Finite State Machine for both accurate estimation of elevator’s short-range displacement and robust estimation of long-term statistical patterns, basing solely on accelerometer.
In IEEE ICSP 2018, 2018



Utilizing Kalman filter and FSM to analyze data from accelerometer for elevator movement monitoring

Monaural Speech Speakers Separation

Speakers separation with RNN in a monaural setting

Neurips 2019 Reproducibility Challenge

Ablation study of a Neurips 2019 paper, submitted to the official challenge

Retinal Image Registration

Introducing Gaussian Field Estimator to achieve robust registration of retinal images from different views and devices

Speech Enhancement with Kalman Filter

Speech enhancement with Kalman Filter and Linear Prediction Coding in a noisy setting

When Deep NLP Models Are Not So Favorable

Empirical study of scenarios where deep NLP models are not favorable over classical models