September 2020 – December 2020

Teaching Assistant

McGill University

COMP 550 Natural Language Processing
September 2019 – Present

Research Assistant

McGill University

Supervised by Prof. Yue Li. Research in machine learning and NLP in healthcare domain.
October 2018 – April 2019

Research Intern

Horizon Robotics

Responsibilities include:

  • Audio data analysis and processing for speech recognition
  • Developing web-based platform for large-scale audio data processing
  • Developing systems for large-scale evaluation of audio recordings’ quality and data selection basing on audio and textual features

July 2018 – October 2018

Research Intern

University of Toronto & SickKids

Responsibilities include:

  • Literature review in DKI, BOLD signal analysis, etc.
  • Developing programs for analyzing BOLD signals for CVR measurement
  • Analyzing experimental data for diagnostic purposes


We propose 3-stage machine learning framework called EpiTopics to facilitate the surveillance of NPI by mining the vast amount of unlabelled news reports about these interventions
Under review, 2021

We propose a novel application of latent topic modeling, using multi-modal topic model to jointly infer distinct topic distributions of notes of different types
Plos ONE, 2021

We present MeDAL, a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain
EMNLP 2020 Clinical NLP, 2020

We developed a semi-supervised, multi-source, dynamic, embedded topic model that is able to simultaneously infer latent topics and learn a linear classifier to predict NPI labels using the topic mixtures as input for each news article.
ACM-BCB, 2020

EHRs present several modeling challenges, including highly sparse data matrices, noisy irregular clinical notes, arbitrary biases in billing code as- signment, diagnosis-driven lab tests leading to not-missing-at-random (NMAR) biases, and heterogeneous data types across clinical notes, billing codes, lab tests, and medications. To address these challenges, we present MixEHR, a multi-view Bayesian framework related to collaborative filtering and latent topic models for EHR data integration and modeling.
Nature Communications, 2020

In our ablation study, we reported several issues regarding the model implementation, and examined the effect of latent space dimension and number of Transformer layers on the performance of the original model.
In NeurIPS 2019 Reproducibility Challenge, 2019

This paper proposes a method that utilizes Kalman filter and Finite State Machine for both accurate estimation of elevator’s short-range displacement and robust estimation of long-term statistical patterns, basing solely on accelerometer.
In IEEE ICSP 2018, 2018


2019-When Deep NLP Models Are Not So Favorable

Empirical study of scenarios where deep NLP models are not favorable over classical models


Utilizing Kalman filter and FSM to analyze data from accelerometer for elevator movement monitoring

2017-Monaural Speech Speakers Separation

Speakers separation with RNN in a monaural setting

2018-Retinal Image Registration

Introducing Gaussian Field Estimator to achieve robust registration of retinal images from different views and devices

2018-Speech Enhancement with Kalman Filter

Speech enhancement with Kalman Filter and Linear Prediction Coding in a noisy setting

2019-Neurips 2019 Reproducibility Challenge

Ablation study of a Neurips 2019 paper, submitted to the official challenge

2020-Graph Active Learning

A benchmark study of active learning on graphs


A fourteen-million articles medical text dataset for medical NLP pretraining