Federated Learning
We develop communication-efficient and statistically rigorous methods for distributed and federated inference across multiple data partners, where individual-level data cannot be centrally pooled. Our work emphasizes lossless and one-shot algorithms that achieve the same statistical efficiency as pooled analyses while preserving data ownership and privacy. We study distributed inference for generalized linear models, linear mixed models, time-to-event data, and high-dimensional causal analyses under data heterogeneity and covariate shift. These methods enable large-scale, multi-site real-world evidence generation across healthcare systems.
Selected papers:
- Tong, J., Hu, J., Hripcsak, G., Ning, Y., & Chen, Y. (2025). DisC2o-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data. Journal of Machine Learning Research, 26(3), 1-50.
- Wu, Q., Reps, J. M., Li, L., Zhang, B., Lu, Y., Tong, J., … & Chen, Y. (2025). COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data. npj Digital Medicine, 8(1), 442.
- Duan, R., Ning, Y., & Chen, Y. (2022). Heterogeneity-aware and communication-efficient distributed statistical inference. Biometrika, 109(1), 67-83.Luo, C., Islam, M. N., Sheils, N. E., Buresh, J., Reps, J., Schuemie, M. J., … & Chen, Y. (2022). DLMM as a lossless one-shot algorithm for collaborative multi-site distributed linear mixed models. Nature communications, 13(1), 1678.
- Duan, R., Luo, C., Schuemie, M. J., Tong, J., Liang, C. J., Chang, H. H., … & Chen, Y. (2020). Learning from local to global: An efficient distributed algorithm for modeling time-to-event data. Journal of the American Medical Informatics Association, 27(7), 1028-1036.
- Duan, R., Boland, M. R., Liu, Z., Liu, Y., Chang, H. H., Xu, H., … & Chen, Y. (2020). Learning from electronic health records across multiple sites: A communication-efficient and privacy-preserving distributed algorithm. Journal of the American Medical Informatics Association, 27(3), 376-385.