Public health data is essential for framing policies and monitoring the course of the coronavirus disease 2019 (COVID-19) pandemic. Some real-time indicators include cases, deaths and hospitalizations following infection with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pathogen. However, these need to be collated on a consistent and reliable basis in order to provide a sound basis for research.
Auxiliary data are also required to enhance the breadth and usefulness of the primary indicators, such as digital surveillance and syndromic monitoring. The accessibility of digital streaming data has allowed outbreaks to be forecast faster, as well as permitting more accurate analyses of the effect produced by public health interventions on the behavior of people in general
A new preprint available on the medRxiv* server reports the utility of the COVIDcast API, a database of all COVID-19 indicators that are constantly being updated, dating from April 2020. This is meant to include both reported cases and deaths from public data, as well as unique data streams such as that from medical (health insurance), claims, antigen testing data, public surveys of symptoms and behavior in public based on smartphone apps measuring mobility, and Google searches based on indicators.
The aggregate of the indicators is made available in public at the county level, along with access to earlier versions of the indicators, to allow revised data to be identified. The required software and an online dashboard are also included to provide a visual representation of the data.
Applications of COVIDcast API
The COVIDcast API indicators have been put into use in many government reports to prepare analyses, get out news stories, and to construct an array of dashboards such as that built by COVID Act Now, COVID Exit Strategy, and to prepare estimates of the predicted case, hospitalization and mortality numbers by Delphi, DeepCOVID and the Institute for Health Metrics and Evaluation (IHME).
Thus, these signals have been of great use to understand how the pandemic affected public health, the efficacy of various interventional strategies, and factors that determined the spread of the virus. It is accessed by thousands of users every day, requesting hundreds of thousands of pieces of information.
The report cites many different signals in the context of their use in relation to COVID-19 activity. Though these indicators are not mainstream, their sources confer advantages in the sense of avoiding the typical delays and errors found with conventional surveillance methods and often unique.
The signals that help monitor COVID activity include Change Healthcare COVID-like illness (CHNG-CLI), in the form of suspected COVID outpatient data; Change Healthcare COVID (CHNG-COVID), dealing with confirmed COVID outpatient data; COVID-19 Trends and Impact Survey CLI (CTIS-CLI) estimating population percentages with symptoms similar to COVID-19; COVID-19 Trends and Impact Survey CLI in the community (CTIS-CLI-in-community), community percentage aware of someone sick; Quidel test positivity rate (Quidel-TPR), positive test percentage.
These indicators track national trends, as well as showing patterns at the state and county levels. These can therefore help provide more accurate forecasts of future COVID-19 caseloads.
The indicators on COVIDcast API show geographical correlation, where signal values correlate with case rates at that location, allowing hotspots to be identified at any moment.
They also correlate with temporal case trends. For example, the survey-based CLI-in-community signal was most closely correlated with case rates, indicating their value in following trends in both symptoms and cases, even if and especially if this is the only source of data.
Conversely, the claims signals (CHNG-COVID and CHNG-CLI) are better correlated with temporal trends and allow comparison of case rates at different time points. Interestingly, the CTIS-CLI-in-community signal is a strong predictor of confirmed cases, comparable to a signal that detects such cases, even though the former is simply the percentage of people who know someone sick with COVID-19-like symptoms.
Ensuring robust data
The use of data from many different sources avoids confusion due to variations in definition and criteria, differences in reporting protocols, backlog clearing, and artefactual increases or decreases in case rates.
The COVIDcast API indicators also allow revisions to be noticed, thus helping build more accurate models and evaluations of existing models. For instance, many claim entries are backfilled, with initial and final reports varying by as much as a fifth for up to 35 days. Again, as death certificates are reviewed, and backlogs in public health reporting of cases and deaths are cleared, thousands of cases and deaths may change their status.
Forecast models will need to exclude backfilled data, and this feature of the API is therefore very useful for such scientists by showing all earlier versions and allowing access to them.
The auxiliary signals allow case monitoring but also show the effects of interventions on public health, thus helping assign resources correctly. Claims data would show people seeking medical help. Increased mobility data would show how far restrictions are being followed. Vaccine acceptance rates may help shape efforts to expand coverage.
What are the implications?
The COVIDcast API thus allows COVID cases to be monitored by region and by time, using signals from several different sources. This ensures the data escapes error from the use of a single source, enhancing the robustness of the forecasts and other findings. It also allows surveillance glitches to be detected and fixed.
The use of non-conventional sources of COVID indicators, such as mobility profiles, searches on the Internet, the use of masks, and vaccine hesitancy, may help frame policies and direct research.
The revision tracking feature allows “what was known when” to be identified, thus helping uncover the performance profile and issues with real-time surveillance indicators.
The use of multiple signals into one format with this feature not only allows public health reporting and surveillance for syndromes of symptoms, but many public behaviors and mobility.
Convenient and real-time access to this data enables continuous telemetry summarizing how things are, how they are expected to change, which areas need additional resources to be allocated in response, and how effective public communication is,” write the authors.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.