TY - JOUR
T1 - Integrating County-Level Socioeconomic Data for COVID-19 Forecasting in the United States
AU - Lucic, Michaelc
AU - Ghazzai, Hakim
AU - Lipizzi, Carlo
AU - Massoud, Yehia
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2021
Y1 - 2021
N2 - Goal: The United States (US) is currently one of the countries hardest-hit by the novel SARS-CoV-19 virus. One key difficulty in managing the outbreak at the national level is that due to the US' diversity, geographic spread, and economic inequality, the COVID-19 pandemic in the US acts more as a series of diverse regional outbreaks rather than a synchronized homogeneous one. Method: In order to determine how to assess regional risk related to COVID-19, a two-phase modeling approach is developed while considering demographic and economic criteria. First, an unsupervised clustering technique, specifically k-means, is employed to group US counties based on demographic and economic similarities. Then, time series forecasting of each cluster of counties is developed to assess the short-run viral transmissibility risk. Results: To this end, we test ARIMA and Seasonal Trend Random Walk forecasts to determine which is more appropriate for modeling the spread and lethality of COVID-19. From our analysis, we then utilize the superior ARIMA models to forecast future COVID-19 trends in the clusters, and present the areas in the US which have the highest COVID-19 related risk heading into the winter of 2020. Conclusion: Including sub-national socioeconomic characteristics to data-driven COVID-19 infection and fatality forecasts may play a key role in assessing the risk associated with changes in infection patterns at the national level.
AB - Goal: The United States (US) is currently one of the countries hardest-hit by the novel SARS-CoV-19 virus. One key difficulty in managing the outbreak at the national level is that due to the US' diversity, geographic spread, and economic inequality, the COVID-19 pandemic in the US acts more as a series of diverse regional outbreaks rather than a synchronized homogeneous one. Method: In order to determine how to assess regional risk related to COVID-19, a two-phase modeling approach is developed while considering demographic and economic criteria. First, an unsupervised clustering technique, specifically k-means, is employed to group US counties based on demographic and economic similarities. Then, time series forecasting of each cluster of counties is developed to assess the short-run viral transmissibility risk. Results: To this end, we test ARIMA and Seasonal Trend Random Walk forecasts to determine which is more appropriate for modeling the spread and lethality of COVID-19. From our analysis, we then utilize the superior ARIMA models to forecast future COVID-19 trends in the clusters, and present the areas in the US which have the highest COVID-19 related risk heading into the winter of 2020. Conclusion: Including sub-national socioeconomic characteristics to data-driven COVID-19 infection and fatality forecasts may play a key role in assessing the risk associated with changes in infection patterns at the national level.
KW - ARIMA
KW - COVID-19
KW - data analytics
KW - means clustering
KW - time series analysis
UR - http://www.scopus.com/inward/record.url?scp=85121057352&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121057352&partnerID=8YFLogxK
U2 - 10.1109/OJEMB.2021.3096135
DO - 10.1109/OJEMB.2021.3096135
M3 - Article
AN - SCOPUS:85121057352
VL - 2
SP - 235
EP - 248
JO - IEEE Open Journal of Engineering in Medicine and Biology
JF - IEEE Open Journal of Engineering in Medicine and Biology
M1 - 9479783
ER -