We have been developing signal detection methods on evaluating vaccine and drug safety using massive Vaccine Adverse Event Reporting System (VAERS) data and FDA Adverse Event Reporting System (FAERS) data. In a sequence of papers, we developed a sensitive signal detection method for identifying temporal variation of adverse effects using VAERS data (Cai et al., 2016, BMC Medical Informatics and Decision Making), and developed and applied methods to study the individual differences in pharmacovigilance for trivalent influenza vaccine (Tao et al., 2015, Studies in Health Technology and Informatics; Du et al., 2016, Biomedical Informatics Insight). We also applied machine learning and GLM methods to compare HPV opinion trends from Twitter user groups, as a way to study social media impact on consumers’ health behavior (Du et al., 2017, Medinfo), and developed pipelines of methods to analyze and visualize the differences in drug safety outcomes using FAERS data (Huang et al., 2017, Medinfo, and Duan et al., 2017, Medinfo). More recently we developed a novel method by integrating VAERS data with CDC survey data of vaccine coverage and U.S. census data of race/ethnicity distribution to quantify differential AE rates by race/ethnicity groups forHPV vaccine; see the figure below (Huang et al. 2018 Frontier). Our method uses a generalized linear mixed effects model to link three sources of datasets, where the components of the model are approximated by different data sources.


We are developing a novel machine-learning framework for temporal risk prediction which incorporates external clinical knowledge. In vaccine/drug safety reports or EHR, patient records are generally longitudinal and represented as medical event sequences, where the events include clinical notes, conditions, medications, vital signs, laboratory reports, and so forth. Building a prediction model using the massive number of longitudinal events can be difficult without the guidance of external knowledge. We aim to develop machine learning methods for risk prediction using the temporal information in EHR data, which allow external knowledge incorporation.


Vaccines have been one of the most successful public health interventions to date. They are, however, pharmaceutical products that carry risks. Effective analyses of post-vaccination adverse events (AEs) is vital to assuring the safety of vaccines, a key public health intervention for reducing the frequency of vaccine- preventable illnesses. The CDC/FDA Vaccine Adverse Event Reporting System (VAERS) contains up to 30,000 reports per year over the past 25 years. VAERS reports include both structured data (e.g., vaccination date, first onset date, age, and gender) and unstructured narratives that often provide detailed clinical information about the clinical events and the temporal relationship of the series of event occurrences post vaccination. The structured data only provide one onsite date whereas temporal information of the sequence of events post vaccination is contained in the unstructured narratives.

While structured data in the VAERS are widely used, the narratives are generally ignored because of the challenges inherent in working with unstructured data. Without these narratives, potentially valuable information is lost. We propose to develop a novel framework to extract and accurately interpret the temporal information contained in the narratives through informatics approaches, and to develop prediction models for risk of severe AEs; see the figure below. Specifically, built upon the state-of-art ontology and natural language processing technologies, we will develop and validate a Temporal Information Modeling, Extraction and Reasoning system for Vaccine data (TIMER-V), which will automatically extract post-vaccination events and their temporal relationships from VAERS reports, semantically infer temporal relations, and integrate the exacted unstructured data with the structured data. Furthermore, we will provide and maintain a publicly available data access interface to query the new integrated data repository, which will facilitate vaccine safety research, casual inference, and other temporal related discovery. We will also develop and validate models to predict severe AEs using the co-occurrence or temporal patterns of the series of AEs post vaccination. We attempt to make use of the unstructured narratives in the VAERS reports to facilitate the temporal related discovery to a broad community of investigators in pharmacology, pharmacoepidemiology, vaccine safety research, among others.

This line of research is a joint work with Dr. Cui Tao at University of Texas School of Biomedical Informatics, and is supported by NIAID (2017-2022)