The pandemic COVID-19 has affected more than 825,000 people in the US, but the daily number of new cases and deaths is not precise. A new study published on the preprint server medRxiv in April 2020 reports that Internet search interest could be used as a means to predict the daily incidence of cases in the US, as was proved in China.
Study: Trends and prediction in daily incidence and deaths of COVID-19 in the United States: a search-interest based model. Image Credit: haysekiz / Shutterstock
The importance of modeling
COVID-19 is advancing steadily across many parts of the world, both developed and developing countries. China has succeeded in modeling the daily incidence of COVID-19. Yet, the epicenter has now shifted to the US, where tens of thousands of new cases are being added to the daily total.
Despite this, not much research has been done on the trends of daily incidence and deaths due to COVID-19 in the US. The current study is based on a Chinese model that shows a correlation between Google search-term interest on the Internet and daily incidence of the disease, with a lag time of 9 days on average. This is also the case with trends in Europe, Taiwan, and Iran.
Internet search interest has also been used to model and detect epidemics of influenza in the US and Australia. The present study attempts to analyze the association between the daily incidence and mortality due to COVID-19 in the US and the interest shown in Google search terms related to the illness.
How was the modeling done?
The researchers extracted data on the number of daily new cases and new death from the 1-point-3-acres.com website and the John Hopkins database on April 9, 2020. Google Trends was used to look for the data on the relevant search terms between March 1 and April 10, 2020. They used nine terms in all, such as COVID-19, COVID, coronavirus, pneumonia, high temperatures, cough, Covid heart, Covid pneumonia, and Covid diabetes.
The search interest, as represented on Google trends, shows how popular that term is compared to the peak popularity by time and region. It is scored between 0 and 100 – inadequate data to assess search interest for the term, and peak popularity, respectively.
They then looked at how the search interest for each term correlated with daily incidence and daily deaths, with a lag period of 20 and 23 days, respectively. They used the top 3 terms to build a generalized linear model for each of these outcomes. Finally, these models were used to predict daily incidence and new deaths in the US in the future.
The predictions were tested against the actual data as it came in to evaluate the accuracy of prediction.
What did the study show?
The researchers elected to use data from John Hopkins because of the slightly better consistency of the data. During the study period, there were 555,245 new cases and 22,019 deaths of COVID-19 in the US.
The search term interest trends were two days behind the date of search, on Google Trends. The search term with the highest popularity was COVID rather than COVID-19, and the former was used for analysis.
The correlation coefficients of each search term depend upon the lag time. The highest correlation was seen for COVID, COVID pneumonia, and Covid heart, with the search interest for these three terms being highly correlated with the daily incidence and new deaths, but with a lag time of 12 and 19 days, respectively.
The predictions for daily incidence and new cases show a predicted plateau for about 12 days, which could mean that in the future, these outcomes will show a plateau. When compared with prospective data, there was moderate to good accuracy of prediction for new cases, while new deaths were predicted with poor to good accuracy.
What do the results tell us?
This is the first study to show the excellent correlation between the search interest and the daily incidence and new deaths. Over the short 4-day follow-up, there was moderate to very good accuracy of prediction.
The researchers say more studies must validate their findings. If so, this type of modeling can help predict and prepare for upcoming trends concerning cases and deaths.
The earlier study in China reported a lag time of 9 days, whereas the US was found to be 12 days. This difference could be due to several factors such as the lower rate of testing in the US relative to China, leading to underestimation of the real daily incidence; delay in the initiation of testing in the US causing longer lag times; differences in the physical and socioeconomic characteristics of patients in the US and China; and possible differences in the subtypes of the virus circulating in China and the US.
The use of Google trends to analyze trends in COVID search terms and correlate them with the number of new deaths and cases in the US provides a better understanding of the pandemic parameters. The number of search terms used here is higher than the two used in earlier models.
The interest trends suggest a high prevalence of pneumonia and heart disease in relation to COVID-19-associated daily cases and deaths, perhaps because cardiac damage and pneumonia are so frequent in these patients. The longer lag times perhaps offer a more significant opportunity to intervene.
The researchers plan to keep updating the model with the latest data, which will allow a more accurate prediction by reducing selection and recall biases in the future. They report a superior correlation between search term interest and the daily number of new cases and deaths than with the earlier Chinese studies.
Overall, this retrospective population-based modeling shows excellent correlation with daily death and incidence of COVID-19 in the US, with moderate to good accuracy of prediction.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.