Identifying risk factors of post–COVID-19 conditions with machine learning and deep learning algorithms

2025-11-07

Guohai Zhou, Scott P. Kelly, Ling Li, Rongjun Shen, Stephen E. Schachterle, Mitchell Henschel, Leo J. Russo, Xiaofeng Zhou,
Identifying risk factors of post–COVID-19 conditions with machine learning and deep learning algorithms,
Global Epidemiology,
Volume 10,
2025,
100221,
ISSN 2590-1133,
https://doi.org/10.1016/j.gloepi.2025.100221.
(https://www.sciencedirect.com/science/article/pii/S2590113325000392)
Abstract: Introduction
Post–COVID-19 conditions (PCC) affect millions of people in the United States. Early diagnosis and PCC management requires an understanding of the epidemiology and drivers behind PCC in the real world.
Methods
We applied multiple machine learning and deep learning models to a large electronic health database of patients with a recent COVID-19 infection in the United States from 2020 to 2022 to quantitatively evaluate progression to newly developed PCC and identify the individual-level risk factors for developing new PCC at 60, 74, 90, and 120 days following initial SARS-CoV-2 infection.
Results
Patients with newly developed primary or secondary PCC were older; had higher Charleson comorbidity scores; and were more likely to smoke, have a body mass index ≥30, or have hyperlipidemia or hypertension than those without evidence of newly developed PCC. Three different machine learning models used to evaluate both the full study period and the Omicron era (beginning January 2022) consistently identified age, the Charlson comorbidity score, and healthcare utilization within 30 days of the index COVID-19 infection as the leading risk factors for developing new primary or secondary PCC. The presence of disseminated intravascular coagulation at baseline was among the 10 strongest predictors of newly developed cardiovascular or secondary PCC in the full study period and the Omicron era.
Conclusion
Multiple machine learning and deep learning models identified the Charlson comorbidity score, age, and frequency of healthcare utilization, which may help predict the occurrence of new PCC and demonstrated the utility of the models for individualized risk prediction.
Keywords: Machine learning; Artificial intelligence; Epidemiology; And epidemiologic methods