Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US

NewsGuard 100/100 Score

In a recent study posted to the medRxiv* pre-print server, researchers identified spatial/geographical (county-level) features associated with increased coronavirus disease 2019 (COVID-19) cases and death counts in the United States (US) across different temporal phases of the COVID-19 pandemic.

Study: Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US. Image Credit: 3DJustincase/Shutterstock
Study: Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US. Image Credit: 3DJustincase/Shutterstock

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

The team trained and tested a structured gaussian processing (SGP)-based machine learning framework on a geographically-tagged large dataset of demographic, socioeconomic, and political data from all the US counties.

Background

The impact of COVID-19 has been heterogeneous all across the US concerning severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission and COVID-19 mortality.

In the US, public health interventions and resources allocations occur at the county level. COVID-19 spread depends upon proximity, hence spatial analysis, employing geographic information systems (GIS), allowed researchers to investigate associations between demographic, socioeconomic factors, and COVID-19 pandemic dynamics at the county level.

Further, it helped them identify and target areas at the highest risk of becoming a COVID-19 hotspot (spatially) to help flatten the pandemic curve.

About the study

In the present study, researchers gathered county-level daily case counts between January 22, 2020, and March 21, 2021, from the Center for Systems Science and Engineering at Johns Hopkins University; likewise, the United States Census Bureau and the National Center for Health Statistics provided country-specific features.

The team predicted daily COVID-19 case counts and death counts for each county using an SGP regression algorithm at the beginning of each week, starting April 6, 2020, until March 21, 2021.

The model was trained on randomly selected two-thirds of the counties in each state and predicted case and death counts of the remaining one-third of the counties. They normalized the daily COVID-19 case and death counts per 100,000 residents to compute a seven-day moving average.

The team used Pearson’s correlation coefficient (PCC) to assess the accuracy of predictions that represented how well the algorithms captured the event count dynamics; likewise, the proportion of variance (R2) showed the proportion of total variation in the model outcomes.

After recognizing highly predictive spatial features, the researchers used a clustering algorithm termed topic modeling (TM) to identify combinations of spatial features closely linked to the COVID-19 spread.

TM computed sets of co-occurring features that could link counties to topics. The researchers segregated discrete groups of counties with similar spatial features (topic contributions) and derived nine clusters of counties based on the relative contributions of Latent Dirichlet Allocation (LDA) topics.

Within each cluster, they showed topic contributions by plotting the average z-score normalized topic score. Likewise, within each quintile, a histogram showed clusters of counties with a higher incidence of cases and deaths per capita.

Study findings

The overall and median PCC and R2 across counties were 0.96 and 0.98, and 0.84 and 0.94, respectively. The observed R2 value greater than 0.90 (in most states) demonstrated that the study model built on spatial features could account for most of the variance in the COVID-19 case and death counts.

The predicted COVID-19 cases and death counts were strongly associated with measures of age, urbanicity, and presidential voting margin. Correlation analysis revealed that the interactions between socioeconomic, health, and racial features complicated the interpretation of the relationships between the spatial features and the COVID-19 dynamics.

TM was able to associate features with topics and could group geographically remote but demographically similar counties. Additionally, TM clustered many geographically-similar counties. For instance, in Cluster 1, the Midwest region witnessed the largest surge in the COVID-19 cases and deaths during 2020 and had counties with high scores from topics 1, 3, and 9 and low scores from topic 10.

While TM showed that counties with similar demographic and socioeconomic features tended to cluster together, the unsupervised clustering based on these topics identified county groups that witnessed varying COVID-19 spread.

As clustering delineated cases from deaths and initial phase from nationwide phase dynamics, it highlighted plasticity in the composition of spatial features which were strongly associated with COVID-19 risk.

Accordingly, Cluster 3, geographically restricted to the Southeast US geographical region, was associated with high COVID-19 case counts during the initial phase, and Cluster 0 restricted to Texas and the Rocky Mountain region, was associated with high COVID-19 case counts during the nationwide phase.

Intriguingly, the presidential vote margin was the most consistently selected spatial feature in all the COVID-19 prediction models. It stood independently and showed no collinearity with other spatial factors.

Conclusions

To summarize, the study findings showed that spatial features accounted for the majority of variance in COVID-19 cases and death counts across the US.

Predictive modeling based on combinations of spatial features could identify counties at the highest risk for COVID-19 spread and inform policymakers to prioritize these counties for aggressive mitigation strategies, especially under limited resources.

Importantly, TM provided a novel dimensional reduction approach to examine epidemiologic data and also proved to be a great tool for analyzing datasets with collinear variables.

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

Journal references:

Article Revisions

  • May 12 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.
Neha Mathur

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Mathur, Neha. (2023, May 12). Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US. News-Medical. Retrieved on April 27, 2024 from https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx.

  • MLA

    Mathur, Neha. "Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US". News-Medical. 27 April 2024. <https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx>.

  • Chicago

    Mathur, Neha. "Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US". News-Medical. https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx. (accessed April 27, 2024).

  • Harvard

    Mathur, Neha. 2023. Using structured gaussian processing algorithm for spatiotemporal prediction of COVID-19 pandemic in the US. News-Medical, viewed 27 April 2024, https://www.news-medical.net/news/20220406/Using-structured-gaussian-processing-algorithm-for-spatiotemporal-prediction-of-COVID-19-pandemic-in-the-US.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Does diabetes increase the risk of long COVID?