The genetic diversity of SARS-CoV-2 in the USA

NewsGuard 100/100 Score

The spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) across the United States during 2020 has been said to have occurred in three “waves” or “phases,” characterized by spikes in the number of reported new cases and a roving geographical distribution.

A number of SARS-CoV-2 lineages with higher transmissibility compared to wildtype were identified during this time, known as variants of concern, raising concerns about the rate of virus mutation and what this means for acquired and engineered immunity.

In a research paper recently uploaded to the preprint server medRxiv* by Capoferri et al. (June 4th, 2021), the genetic diversity of SARS-CoV-2 through each phase is examined in detail using publicly available genomic data available from before 2021, highlighting the need to continuously track and assess the evolution of the virus to ensure the future efficacy of the currently available vaccines.

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

The phases of COVID-19 spread

SARS-CoV-2 was introduced to the US from Europe and Asia in winter 2019, with cases rising rapidly until spring 2020, referred to as phase 1.

The northeast was affected particularly, with many community transmissions occurring in this short space of time. Phase 2 began in summer 2020, this time with the south-western USA bearing a greater burden of cases as non-pharmaceutical interventions were beginning to be relaxed.

The mid-west saw the earliest surge in cases at the beginning of phase 3 in fall 2020, though cases were on the rise nationwide before widespread vaccine distribution began in early 2021.

The authors note some discrepancies in the distribution of cases across the USA and the available SARS-CoV-2 genomic sequences. For example, while the south bore the majority of cases overall, most genomic sequences were obtained from the west.

In total, only 1.2% of all reported cases in the country throughout 2020 had a corresponding viral sequence, compared with 8.1% in the UK and 6.2% in Australia. The median time from sample collection to full genomic sequence acquisition is around 100 days. Thus many samples from the latter stages of phase 3 were unavailable to the group at the time of writing, though they note that the overall sequencing rate has improved in 2021 from the 2020 levels.

Tracking SARS-CoV-2 clades

GISAID is an international organization that monitors influenza and now SARS-CoV-2, providing open access genomic data on the viruses. They categorize SARS-CoV-2 clades and lineages based on differences in genetic sequence, assigning them letter symbols for easy identification. The earliest GISAID-assigned clades were: G, GH, GR, S, L, and V, each of which was identified in the USA during phase 1.

SARS-CoV-2 Epidemic in the U.S. in 2020 (A) Daily COVID-19 cases in the U.S. in 2020 (B) Daily COVID-19 deaths in the U.S. in 2020 (C) U.S. regional map colored by region (D) Number of COVID-19 cases in the U.S. in 2020 by region: Northeast, South, West, Midwest, respectively. (E) Number of COVID-19 deaths in the U.S. in 2020 by region. (A-B & D-E) Separation of Phases is denoted by vertical dotted red lines. Data were smoothed by a moving 3-day average. (F) Proportion of COVID-19 cases by region during each phase and the overall contribution to the U.S. total in 2020. (G) Proportion of SARS-CoV-2 sequences accessed (submission as of December 15th, 2020) by region during each phase and the overall contribution to the U.S. total in 2020 (H) The number of sequneces per case were obtained by each region during each phase and the U.S. total in 2020. (F-H) Highlights Phase 1, 2, and 3, followed with U.S. total of 2020. (I) Total number of sequences submitted to GISAID from the U.K., Australia, and the U.S. by December 15th, 2020. (J) Submitted SARS-CoV-2 genomes normalized to the number of COVID-19 cases from the U.K., Australia, and the U.S.
SARS-CoV-2 Epidemic in the U.S. in 2020 (A) Daily COVID-19 cases in the U.S. in 2020 (B) Daily COVID-19 deaths in the U.S. in 2020 (C) U.S. regional map colored by region (D) Number of COVID-19 cases in the U.S. in 2020 by region: Northeast, South, West, Midwest, respectively. (E) Number of COVID-19 deaths in the U.S. in 2020 by region. (A-B & D-E) Separation of Phases is denoted by vertical dotted red lines. Data were smoothed by a moving 3-day average. (F) Proportion of COVID-19 cases by region during each phase and the overall contribution to the U.S. total in 2020. (G) Proportion of SARS-CoV-2 sequences accessed (submission as of December 15th, 2020) by region during each phase and the overall contribution to the U.S. total in 2020 (H) The number of sequneces per case were obtained by each region during each phase and the U.S. total in 2020. (F-H) Highlights Phase 1, 2, and 3, followed with U.S. total of 2020. (I) Total number of sequences submitted to GISAID from the U.K., Australia, and the U.S. by December 15th, 2020. (J) Submitted SARS-CoV-2 genomes normalized to the number of COVID-19 cases from the U.K., Australia, and the U.S.

G-based clades are defined by the D614G mutation to the spike protein, being more infectious and expressing better resistance to some monoclonal antibodies than wildtype, though convalescent serum remains effective at neutralization, and clinical outcomes are similar to or even lesser than wildtype SARS-CoV-2.

Over 99% of sequences collected during phase 2 were of a G-based clade, demonstrating the rapid rise to dominance of this highly transmissible strain.

The average pair-wise distance among G-based clades rose from 0.02% in phase 1 to 0.06% in phase 3, with an approximate rate of change of 1.95 nucleotides per month.

Clades GH and GR emerged from this clade, and show even higher average mutation rates at 2.85 and 2.22 nucleotides per month, respectively.

In total, there was an increase of 14% in the number of unique variants of the G-clade throughout 2020 and an increase of 17% in the GR clade specifically.

Interestingly, the GH clade had an 11% decrease in the number of variants while the difference between variants increased.

The measure of the degree to which random populations of the virus remain non-divergent over time was also calculated for each clade, finding that G and S-based clades diverged heavily during phases 1 and 2, suggesting that viral evolution was directional. Had a great deal of unstructured mixing of differing clades taken place, divergence would be lesser, demonstrating that SARS-CoV-2 had fully pervaded the human population.

The authors state that around half of new mutations arising in the US and persisting at a frequency of over 5% were unique. The clade G nucleocapsid mutation S194L and clade GH mutations L3352F, N1653D and R2613C to ORF1a and ORF1b, respectively, increased drastically by more than 40% in representation from phase 1 to 3, and phase 3 saw the most unique mutations overall,  even given the smaller available sample pool. Many of the defining mutations of SARS-CoV-2 variants of concern were identified by the group throughout 2020 before they had officially been recognized as distinct lineages. These mutations were present at a frequency of only around 1% in phase 1, rising to almost 5% in phase 3.

Future SARS-CoV-2 evolution

While SARS-CoV-2 demonstrates high replication fidelity compared with many other RNA viruses, the wide global spread of the virus has allowed ample opportunity for mutation.

The authors characterize the evolution of SARS-CoV-2 as slow but inexorable, being mainly driven by genetic drift, with some mild selection pressures towards high transmissibility and immune escape by competition with other strains.

Chronically infected immunosuppressed individuals that receive treatment with neutralizing antibodies are thought to be an ideal environment for more significant mutations to occur, proving an isolated container with more intense selection pressures, and many of the more concerning variants may have come about in this way.

Similarly, the general genetic diversity in SARS-CoV-2 has been promoted by low adherence to non-pharmaceutical measures in the community, with some adhering populations providing isolated conditions suitable for mutation before then being spread by the non-adherent.

As more of the community is vaccinated, selective pressure towards strains that better escape immune capture will be promoted. Thus continuous monitoring of the genome of the virus is essential.

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

Journal references:

Article Revisions

  • Apr 8 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.
Michael Greenwood

Written by

Michael Greenwood

Michael graduated from the University of Salford with a Ph.D. in Biochemistry in 2023, and has keen research interests towards nanotechnology and its application to biological systems. Michael has written on a wide range of science communication and news topics within the life sciences and related fields since 2019, and engages extensively with current developments in journal publications.  

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Greenwood, Michael. (2023, April 08). The genetic diversity of SARS-CoV-2 in the USA. News-Medical. Retrieved on April 19, 2024 from https://www.news-medical.net/news/20210607/The-genetic-diversity-of-SARS-CoV-2-in-the-USA.aspx.

  • MLA

    Greenwood, Michael. "The genetic diversity of SARS-CoV-2 in the USA". News-Medical. 19 April 2024. <https://www.news-medical.net/news/20210607/The-genetic-diversity-of-SARS-CoV-2-in-the-USA.aspx>.

  • Chicago

    Greenwood, Michael. "The genetic diversity of SARS-CoV-2 in the USA". News-Medical. https://www.news-medical.net/news/20210607/The-genetic-diversity-of-SARS-CoV-2-in-the-USA.aspx. (accessed April 19, 2024).

  • Harvard

    Greenwood, Michael. 2023. The genetic diversity of SARS-CoV-2 in the USA. News-Medical, viewed 19 April 2024, https://www.news-medical.net/news/20210607/The-genetic-diversity-of-SARS-CoV-2-in-the-USA.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Study reveals how SARS-CoV-2 hijacks lung cells to drive COVID-19 severity