Dr. Yong Chen

Yong Chen

Dr. Yong Chen, Professor of Biostatistics, founded and directs the Computing, Inference, and Learning Lab (PENNCIL) at University of Pennsylvania. The mission of PENNCIL lab is to develop computational methods and software to transform real-world data into insights, to disseminate the methods and knowledge to research communities, and to bridge the gap from data to actionable health care.

Research areas:

  • Real-world data; clinical evidence generation; learning health system; healthcare delivery.

Education:

Awards:

Selected Publications:

Teaching

Courses Taught at the University of Pennsylvania

Course instructors:

Yong Chen

 Description:

This graduate-level Biostatistics course will introduce the fundamentals of statistical methods for meta-analyses.  It will cover key principles of meta-analysis and the statistical rationales behind the analytic models, including univariate meta-analysis, multivariate meta-analysis, meta-analysis of diagnostic test accuracy, network meta-analysis, and multivariate network meta-analysis. Beyond these commonly used models, the course will cover statistical methods and software that investigate and correct for biases in systematic reviews such as publication bias, outcome reporting bias. Advanced statistical inferential tools such as composite likelihood, pseudolikelihood, integrated likelihood methods, EM algorithms will be introduced.
In addition, the cover will also cover some practical steps in systematic review including search strategies, data abstraction methods; quality assessment; and writing a meta-analysis report.
The course is composed of a series of weekly lectures and small group discussions. Students will be expected to attend weekly lectures, participate in class discussions, review assigned readings, complete homework assignments, and conduct a real-world meta-analysis with a clinically meaningful problem.
The students will be evaluated based on 2 homework assignments and a final in-class presentation of their final projects.

 Textbooks:

1. [Primary textbook] Schwarzer, Guido, Carpenter, James R., Rücker, Gerta. Meta-Analysis with R. Springer 2015.
2. [Primary textbook] Egger, Matthias, George D. Smith, and Douglas G. Altman, eds. Systematic Reviews in Health Care: Meta-analysis in Context. London: BMJ Publishing Group, 2001.
3. [Optional textbook] Borenstein, Michael, Larry V. Hedges, Julian P. T. Higgins, Hannah R. Rothstein. Introduction to Meta-Analysis. Wiley, 2009.
4. [Optional textbook] Rothstein, Hannah R., Alexander J. Sutton, Michael Borenstein. Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments. Wiley, 2005.
Course format:
This course will have a hybrid lecture/seminar format, with Dr. Yong Chen presenting lectures on standard and advanced statistical methods for meta-analysis, and several guests who will describe important aspects of systematic review from their perspectives as clinicians, epidemiologists, medical librarians, and systematic reviewers. The guest speakers include Drs. Jesse Berlin (Johnson & Johnson Ltd), Robert J. DeRubeis (UPenn), Eileen Erinoff (Emergency Care Research Institute, ECRI), Tianjing Li (the Johns Hopkins University School of Public Health). All of them have given guest lectures co-directed by Dr. Yong Chen two years ago.

 Expectation:

This course is expected to attract students from the first year and above in their PhD program, and will likely include students in GGEB (Biostatistics and Epidemiology programs) as well as perhaps students in other groups, such as MSCE students, who meet the prerequisites.

Course instructors:

Justine Shults (part I: Linear models) and Yong Chen (Part II: Generalized linear models)

 Description:

This is a course on methods for generalized linear models (GLMs), rather than a course on using software for data analysis with GLMs. This course is designed to provide students with a fundamental understanding of theory and applications of the GLMs. Emphasis will be placed on statistical modeling, building from standard normal linear models, extending to GLMs, and going beyond GLMs. The main subjects are logit models for nominal and ordinal data, log-linear models, models for repeated categorical data, generalized linear mixed models and other mixture models for categorical data. Methods of maximum likelihood, weighted least squares, and generalized estimating equations will be used for estimation and inference.”

 Textbooks:

1. Agresti, A. (2002). Categorical Data Analysis (Second Edition). Wiley. ISBN-10: 0471360937.

2. McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models (Second Edition). Chapman and Hall. ISBN-10: 0412317605.

 Learning objectives:

Regression analysis has been developed for many years and remains one of the most commonly used statistical tools to help scientists address their scientific questions. Generalized linear models (GLMs) were formulated by John Nelder and Robert Wedderburn as a way of unifying various other statistical models, including ANCOVA, linear regression, logistic regression and log-linear models for contingency tables and count data. This lecture will introduce GLMs and some recent developments of regression techniques with focus on generalized linear models, quasi likelihood methods and estimating function approaches.

 List of topics:

  • Generalized linear models and maximum likelihood method

  • Quasi-likelihood method and estimating equation

  • Model selection

  • Analysis of binary data

  • Analysis of polytomous responses

  • Analysis of count data: log linear models

  • Analysis of contingency table

  • Generalized linear mixed effect models

  • Analysis of matched data

  • Inference for correlated responses: marginal models and random effect models

 Expectation:

By the end of the course, the students are expected to: 1) understand the main components of GLMs; 2) build and apply appropriate models to binary, nominal, ordinal or count data; 3) build and apply appropriate models to correlated outcomes; 4) make inference for a given model and interpret the results in the scientific context

Course directors:

Craig Umscheid and Yong Chen

Objective:

This 1.0 unit graduate-level course will provide an introduction to the fundamentals of systematic reviews and meta-analyses.  It will cover introductory principles of meta-analysis; protocol development; search strategies; data abstraction methods; quality assessment; meta-analytic methods; and applications of meta-analysis.  The course is composed of a series of weekly small group lectures and discussions. Students will be expected to attend weekly didactics, participate in class discussions, review assigned readings, complete homework assignments, and draft a systematic review protocol of their choosing suitable for IRB submission.

Assignments:

Students will be required to complete readings in the textbook and articles referenced for each session. In addition, each student will complete homework assignments assigned by the instructors including a data analysis project using a meta-analysis dataset provided by the instructors: download Stata meta-analysis modules from the Stata website, review dataset variables, complete an analysis, and write-up their findings. Finally, students will draft a systematic review protocol of their choosing and present their protocol at the conclusion of the class. There are no examinations.

Course instructors:

Yong Chen (Part I) and Jinbo Chen (Part II)

Outline of topics:

Parametric Inference:

        Unbiased estimation and unbiased estimating functions

        Maximum likelihood estimation: Consistency, asymptotic normality, and efficiency

        Hypothesis testing: Wald test, Likelihood ratio test, Score test

        Influence functions

        EM algorithm

        Model checking, Model mis-specification, and model selection

        Examples of Non-regular maximum likelihood estimation

        Marginal likelihood, Conditional likelihood, (modified) profile likelihood, composite likelihood, and pseudolikelihood

        U-statistics theory

        Contiguity theory       

        Bayes and Empirical Bayes estimators, Bayesian tests

    

Semiparametric Inference:

         Semiparametric maximum likelihood estimation (Case-control study; Cox proportional hazards regression)

         Z-estimation/M-estimation 

         Generalized score test, with Pearson’s Chi^2 test as an example

         Semiparametric inference with incomplete data

Course instructors:

Yong Chen

Description:

This course presents extensions of general and generalized linear models to longitudinal and correlated outcome data with special emphasis on clinical, epidemiologic, and public health applications. Major topics include generalized linear mixed linear models (GLMM) for continuous, binomial, and count data; maximum likelihood estimation; generalized estimating equations (GEE); current general and specialized software applicable to these methods; and readings from current statistical literature. Each student will be required to participate in 4 labs and complete associated problem sets. Software will include Stata.

 Textbooks:

1. Diggle, P,  Heagerty, P, Liang, K-Y and Zeger, S. (2013). Analysis of Longitudinal Data (Second Edition). Oxford University Press. ISBN-10: 0198524846.

2. Fitzmaurice GM, Laird NM, Ware JH.  Applied Longitudinal Analysis.  Second Edition. New York: Wiley; 2011.  ISBN: 978-0-470-38027-7. Hardcover  740 pages; August 2011 

3. Singer JD, Willett JB.   Applied Longitudinal Analysis.    New York: Oxford 2003.

 Graphics texts: 

Mitchell MN.   A Visual Guide to Stata Graphics.  3rd Edition.  College Station, TX: Stata Press; 2012.

Courses Taught at the University of Texas School of Public Health

Course instructor:

Yong Chen

Description:

This is a course on methods for generalized linear models (GLMs), rather than a course on using software for data analysis with GLMs. This course is designed to provide students with a fundamental understanding of theory and applications of the GLMs. Emphasis will be placed on statistical modeling, building from standard normal linear models, extending to GLMs, and going beyond GLMs. The main subjects are logit models for nominal and ordinal data, log-linear models, models for repeated categorical data, generalized linear mixed models and other mixture models for categorical data. Methods of maximum likelihood, weighted least squares, and generalized estimating equations will be used for estimation and inference.”

Textbooks:

1. Agresti, A. (2002). Categorical Data Analysis (Second Edition). Wiley. ISBN-10: 0471360937.

2. McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models (Second Edition). Chapman and Hall. ISBN-10: 0412317605.

Learning objectives:

Regression analysis has been developed for many years and remains one of the most commonly used statistical tools to help scientists address their scientific questions. Generalized linear models (GLMs) were formulated by John Nelder and Robert Wedderburn as a way of unifying various other statistical models, including ANCOVA, linear regression, logistic regression and log-linear models for contingency tables and count data. This lecture will introduce GLMs and some recent developments of regression techniques with focus on generalized linear models, quasi likelihood methods and estimating function approaches.

List of topics:

  • Generalized linear models and maximum likelihood method

  • Quasi-likelihood method and estimating equation

  • Model selection

  • Analysis of binary data

  • Analysis of polytomous responses

  • Analysis of count data: log linear models

  • Analysis of contingency table

  • Generalized linear mixed effect models

  • Analysis of matched data

  • Inference for correlated responses: marginal models and random effect models

Expectation:

By the end of the course, the students are expected to: 1) understand the main components of GLMs; 2) build and apply appropriate models to binary, nominal, ordinal or count data; 3) build and apply appropriate models to correlated outcomes; 4) make inference for a given model and interpret the results in the scientific context.

Course instructors:

Yong Chen

Description:

This course presents extensions of general and generalized linear models to longitudinal and correlated outcome data with special emphasis on clinical, epidemiologic, and public health applications. Major topics include generalized linear mixed linear models (GLMM) for continuous, binomial, and count data; maximum likelihood estimation; generalized estimating equations (GEE); current general and specialized software applicable to these methods; and readings from current statistical literature. Each student will be required to participate in 4 labs and complete associated problem sets. Software will include Stata.

 Textbooks:

1. Diggle, P,  Heagerty, P, Liang, K-Y and Zeger, S. (2013). Analysis of Longitudinal Data (Second Edition). Oxford University Press. ISBN-10: 0198524846.

2. Fitzmaurice GM, Laird NM, Ware JH.  Applied Longitudinal Analysis.  Second Edition. New York: Wiley; 2011.  ISBN: 978-0-470-38027-7. Hardcover  740 pages; August 2011 

3. Singer JD, Willett JB.   Applied Longitudinal Analysis.    New York: Oxford 2003.

 Graphics texts: 

Mitchell MN.   A Visual Guide to Stata Graphics.  3rd Edition.  College Station, TX: Stata Press; 2012.

Course instructor:

Yong Chen

Description:

This is a course on methods for generalized linear models (GLMs), rather than a course on using software for data analysis with GLMs. This course is designed to provide students with a fundamental understanding of theory and applications of the GLMs. Emphasis will be placed on statistical modeling, building from standard normal linear models, extending to GLMs, and going beyond GLMs. The main subjects are logit models for nominal and ordinal data, log-linear models, models for repeated categorical data, generalized linear mixed models and other mixture models for categorical data. Methods of maximum likelihood, weighted least squares, and generalized estimating equations will be used for estimation and inference.”

Textbooks:

1. Agresti, A. (2002). Categorical Data Analysis (Second Edition). Wiley. ISBN-10: 0471360937.

2. McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models (Second Edition). Chapman and Hall. ISBN-10: 0412317605.

Learning objectives:

Regression analysis has been developed for many years and remains one of the most commonly used statistical tools to help scientists address their scientific questions. Generalized linear models (GLMs) were formulated by John Nelder and Robert Wedderburn as a way of unifying various other statistical models, including ANCOVA, linear regression, logistic regression and log-linear models for contingency tables and count data. This lecture will introduce GLMs and some recent developments of regression techniques with focus on generalized linear models, quasi likelihood methods and estimating function approaches.

List of topics:

  • Generalized linear models and maximum likelihood method

  • Quasi-likelihood method and estimating equation

  • Model selection

  • Analysis of binary data

  • Analysis of polytomous responses

  • Analysis of count data: log linear models

  • Analysis of contingency table

  • Generalized linear mixed effect models

  • Analysis of matched data

  • Inference for correlated responses: marginal models and random effect models

Expectation:

By the end of the course, the students are expected to: 1) understand the main components of GLMs; 2) build and apply appropriate models to binary, nominal, ordinal or count data; 3) build and apply appropriate models to correlated outcomes; 4) make inference for a given model and interpret the results in the scientific context.

Course instructor:

Yong Chen

Description:

This is a course on methods for generalized linear models (GLMs), rather than a course on using software for data analysis with GLMs. This course is designed to provide students with a fundamental understanding of theory and applications of the GLMs. Emphasis will be placed on statistical modeling, building from standard normal linear models, extending to GLMs, and going beyond GLMs. The main subjects are logit models for nominal and ordinal data, log-linear models, models for repeated categorical data, generalized linear mixed models and other mixture models for categorical data. Methods of maximum likelihood, weighted least squares, and generalized estimating equations will be used for estimation and inference.”

Textbooks:

1. Agresti, A. (2002). Categorical Data Analysis (Second Edition). Wiley. ISBN-10: 0471360937.

2. McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models (Second Edition). Chapman and Hall. ISBN-10: 0412317605.

Learning objectives:

Regression analysis has been developed for many years and remains one of the most commonly used statistical tools to help scientists address their scientific questions. Generalized linear models (GLMs) were formulated by John Nelder and Robert Wedderburn as a way of unifying various other statistical models, including ANCOVA, linear regression, logistic regression and log-linear models for contingency tables and count data. This lecture will introduce GLMs and some recent developments of regression techniques with focus on generalized linear models, quasi likelihood methods and estimating function approaches.

List of topics:

  • Generalized linear models and maximum likelihood method

  • Quasi-likelihood method and estimating equation

  • Model selection

  • Analysis of binary data

  • Analysis of polytomous responses

  • Analysis of count data: log linear models

  • Analysis of contingency table

  • Generalized linear mixed effect models

  • Analysis of matched data

  • Inference for correlated responses: marginal models and random effect models

Expectation:

By the end of the course, the students are expected to: 1) understand the main components of GLMs; 2) build and apply appropriate models to binary, nominal, ordinal or count data; 3) build and apply appropriate models to correlated outcomes; 4) make inference for a given model and interpret the results in the scientific context.

Course instructors:

Yong Chen

Description:

This course presents extensions of general and generalized linear models to longitudinal and correlated outcome data with special emphasis on clinical, epidemiologic, and public health applications. Major topics include generalized linear mixed linear models (GLMM) for continuous, binomial, and count data; maximum likelihood estimation; generalized estimating equations (GEE); current general and specialized software applicable to these methods; and readings from current statistical literature. Each student will be required to participate in 4 labs and complete associated problem sets. Software will include Stata.

 Textbooks:

1. Diggle, P,  Heagerty, P, Liang, K-Y and Zeger, S. (2013). Analysis of Longitudinal Data (Second Edition). Oxford University Press. ISBN-10: 0198524846.

2. Fitzmaurice GM, Laird NM, Ware JH.  Applied Longitudinal Analysis.  Second Edition. New York: Wiley; 2011.  ISBN: 978-0-470-38027-7. Hardcover  740 pages; August 2011 

3. Singer JD, Willett JB.   Applied Longitudinal Analysis.    New York: Oxford 2003.

 Graphics texts: 

Mitchell MN.   A Visual Guide to Stata Graphics.  3rd Edition.  College Station, TX: Stata Press; 2012.

Course instructor:

Yong Chen

Description:

This is a course on methods for generalized linear models (GLMs), rather than a course on using software for data analysis with GLMs. This course is designed to provide students with a fundamental understanding of theory and applications of the GLMs. Emphasis will be placed on statistical modeling, building from standard normal linear models, extending to GLMs, and going beyond GLMs. The main subjects are logit models for nominal and ordinal data, log-linear models, models for repeated categorical data, generalized linear mixed models and other mixture models for categorical data. Methods of maximum likelihood, weighted least squares, and generalized estimating equations will be used for estimation and inference.”

Textbooks:

1. Agresti, A. (2002). Categorical Data Analysis (Second Edition). Wiley. ISBN-10: 0471360937.

2. McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models (Second Edition). Chapman and Hall. ISBN-10: 0412317605.

Learning objectives:

Regression analysis has been developed for many years and remains one of the most commonly used statistical tools to help scientists address their scientific questions. Generalized linear models (GLMs) were formulated by John Nelder and Robert Wedderburn as a way of unifying various other statistical models, including ANCOVA, linear regression, logistic regression and log-linear models for contingency tables and count data. This lecture will introduce GLMs and some recent developments of regression techniques with focus on generalized linear models, quasi likelihood methods and estimating function approaches.

List of topics:

  • Generalized linear models and maximum likelihood method

  • Quasi-likelihood method and estimating equation

  • Model selection

  • Analysis of binary data

  • Analysis of polytomous responses

  • Analysis of count data: log linear models

  • Analysis of contingency table

  • Generalized linear mixed effect models

  • Analysis of matched data

  • Inference for correlated responses: marginal models and random effect models

Expectation:

By the end of the course, the students are expected to: 1) understand the main components of GLMs; 2) build and apply appropriate models to binary, nominal, ordinal or count data; 3) build and apply appropriate models to correlated outcomes; 4) make inference for a given model and interpret the results in the scientific context.

Joint appointments: