A data commons to accelerate research in COVID and long-COVID

In a recent study posted to the medRxiv* pre-print server, researchers in the United States developed a pandemic response commons (PRC) called the Chicagoland coronavirus disease 2019 (COVID-19) commons (CCC). The CCC served Chicago, the state of Illinois, and surrounding regions in the United States (US).

Study: The Pandemic Response Commons. Image Credit: Orpheus FX / ShutterstockStudy: The Pandemic Response Commons. Image Credit: Orpheus FX / Shutterstock

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.


The US Centers for Disease Control and Prevention (CDC) tracking project pointed at several regional differences in the COVID-19 incidence, fatalities, and health disparities. Therefore, it became crucial to curate, integrate, and analyze COVID-19 data at the regional levels and aggregate the results to inform national-level policies.

A data commons, such as PRC, curate, integrate, and harmonize data for a specific community, e.g.,  researchers studying an epidemic or pandemic, public health workers, and policymakers. Typically they require several legal and data agreements.

However, a regional instance of a PRC developed in the current study was designed to be part of a broader data ecosystem, operate at a low level, and increase activity as required by the pandemic. Most importantly, it comprised multiple regional commons to support the pandemic response through local, regional, and federated data sharing and analysis.

About the study

In the present study, researchers used the open-source Gen3 data platform to develop PRC, and a formal consortium of Chicagoland area organizations operated it. Gen3, based upon consortium, data, and platform agreements, was developed by the non-profit Open Commons Consortium.

The Open Commons Consortium has three main functions, as follows:

i)  it helps establishment of a consortium to build and operate a data commons,

ii) ensures data is contributed to a data commons, and

iii) facilitates its members to work in groups, analyze data, and develop software applications and services to enhance the functionality of the commons.

The CCC curated and harmonized several datasets, including clinical data of ~90,000 patients, statistical data summary of COVID-19 cases, and sequencing data of over 5,300 severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant genomes.

Study findings

The CCC had eight members and five working groups. Its eight members, viz., Rush University Medical Center, University of Chicago, Southern Illinois University, the University of Illinois at Chicago, St. Anthony Hospital, Sinai Chicago, NorthShore University HealthSystem, and CommunityHealth, had contributed clinical data from over 90,000 subjects with COVID-19.

The clinical data working group developed a standard data model for each member to contribute data in the required format. The epidemiological modeling working group used the CCC-obtained aggregated counts for COVID-19 cases, deaths, and select comorbidities to understand health disparities and build predictive models. They developed hierarchical Bayesian models that predicted county-wise future COVID cases and fatality counts for Illinois. Likewise, the working group developed regression models to understand temporal, age-related race/ethnicity differences in case/fatality ratios. The variant surveillance working group collected and contributed over 5,300 SARS-CoV-2 genome sequences to national and international genomic databases.

Screenshot of PRCScreenshot of PRC

Gen3 software automatically generates application programming interface (APIs) for data and metadata access, data submission, authorization, and authentication, all of which make both controlled and public access findable, accessible, interoperable, and reusable (FAIR). For instance, PRC hosts a publicly accessible PRC Jupyter Notebook Browser that helps access COVID-19 case incidence, fatality, clinical, mobility, and imaging data.

Three participating institutions contributed patient-level COVID-19 data starting March 1, 2020. The PRC analyzed submitted data and identified data quality issues. The quality analysis included developing plots to compare patient counts by demography, symptoms, hospitalization events, and pre-existing comorbidities. Further, the PRC used statistical summary reports (SSR) county-level data to develop epidemiological models, which provided information for map overlays that are easily accessible to the public.

Screenshot of viral variants and their geographic distribution

Screenshot of viral variants and their geographic distribution

The PRC also worked on a project with Southern Illinois University (SIU) to analyze the genomic sequence of SARS-CoV-2 and better understand the spread of COVID-19 across Illinois. The project had sequenced over 5,300 SARS-CoV-2 genomes spanning 16 viral clades and more than 150 variants to track SARS-CoV-2 evolution in Illinois and identify the appearance of specific SARS-CoV-2 variants of concern (VOCs).


The CCC contained clinical data from over 90,000 COVID-19, SSRs for the analysis of COVID-19 health disparities, over 5,300 SARS-CoV-2 genome sequencing data, and COVID-19-related public data. Overall, the CCC data was rich, readily available to a broader community, and enhanced the national view of COVID-19-related issues to accelerate research on COVID-19 and Long COVID. In summary, the study highlighted the significance of a regional COVID-19 commons in complementing the ongoing efforts to gather COVID-19 data at the national level to help support clinical research and policy development.

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Neha Mathur

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Mathur, Neha. (2023, March 17). A data commons to accelerate research in COVID and long-COVID. News-Medical. Retrieved on July 19, 2024 from https://www.news-medical.net/news/20220628/A-data-commons-to-accelerate-research-in-COVID-and-long-COVID.aspx.

  • MLA

    Mathur, Neha. "A data commons to accelerate research in COVID and long-COVID". News-Medical. 19 July 2024. <https://www.news-medical.net/news/20220628/A-data-commons-to-accelerate-research-in-COVID-and-long-COVID.aspx>.

  • Chicago

    Mathur, Neha. "A data commons to accelerate research in COVID and long-COVID". News-Medical. https://www.news-medical.net/news/20220628/A-data-commons-to-accelerate-research-in-COVID-and-long-COVID.aspx. (accessed July 19, 2024).

  • Harvard

    Mathur, Neha. 2023. A data commons to accelerate research in COVID and long-COVID. News-Medical, viewed 19 July 2024, https://www.news-medical.net/news/20220628/A-data-commons-to-accelerate-research-in-COVID-and-long-COVID.aspx.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Accelerate Your Research: Dispen3D Harnesses the Power of 3D Models