A multinational study currently available on the bioRxiv* preprint server shows that in the early large genome of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) only a handful of amino acids are exchanged, implying a relatively low mutation rate and selection pressure.
The coronavirus (COVID-19) pandemic caused by SARS-CoV-2 represents the most significant public health emergency in recent history. Both the epidemiology and evolution of such a novel virus – after its spread into a new host – are thought to be pivotal for its success in generating and sustaining an outbreak.
Approximately 50 thousand complete (or near-complete) genomic sequences of SARS-CoV-2 are available to date, and the number is rising daily. These genomes provide vital insights into the ongoing viral evolution during the pandemic, which might help in mitigating and controlling its spread.
Novel Coronavirus SARS-CoV-2 Colorized scanning electron micrograph of an apoptotic cell (green) heavily infected with SARS-CoV-2 virus particles (purple), isolated from a patient sample. Image at the NIAID Integrated Research Facility (IRF) in Fort Detrick, Maryland. Credit: NIAID
What about the diversity of SARS-CoV-2 sequences?
RNA viruses are notorious for their rapid evolution and subsequent accumulation of amino acid mutations, which in turn may significantly affect the transmissibility of the virus, its cell affinity, and pathogenicity.
Luckily, the observed diversity among pandemic SARS-CoV-2 sequences was thus far very low; nonetheless, its rapid and global spread may create the perfect storm and provide the possibility of positive natural selection.
Some studies have already reported that SARS-COV-2 has practically evolved into new genotypes/subtypes, but such analytical appraisals of the phylogenetic tree were based on only a few hundred sequences.
This is why researchers from Nanjing Agricultural University and Zhejiang University in China, KU Leuven in the Netherlands, Tulane University and the University of California in the US, as well as from the Free University Berlin in Germany developed an early warning pipeline (containing thousands of sequences) in order to study SARS-CoV-2 evolution within COVID-19 pandemic.
Phylogenetic, spatial and temporal analysis
This study utilized a total of 4,894 SARS-CoV-2 sequences that were available from GISAID and NCBI GenBank (both comprehensive public databases of nucleotide sequences) to perform phylogenetic analysis based on whole genomes.
Furthermore, specific software packages were used to estimate the ratio of non-synonymous substitutions versus synonymous substitutions, as well as to pinpoint the exact sites that were subjected to potential positive selection.
"We also performed a spatial and temporal analysis that revealed the time and region at which a certain mutation appeared first and how it spread to other continents together with its frequency change in the sampled population," study authors further explain their methodological approach.
For the viral proteins where three-dimensional structural information has been available, the researchers pursued an in-depth analysis of possible repercussions of amino acid exchanges.
Five major clusters and fast-evolving sites on SARS-COV-2 proteins
"Our phylogenetic analysis shows that the genetic diversity of SARS-COV-2 is relatively low during this early stage of the epidemic, the viral genome is largely stable and the virus did not evolve rapidly after its emergence in humans", study authors emphasize their main findings.
Akin to the previous observations, the researchers identified five major clusters in the phylogenetic tree by using full genomes. Each of those five clusters was characterized by its own mutations that may serve as targets for fast genotyping of samples from patients.
Moreover, this study identified and specified the geographic and temporal patterns of high-frequency mutations. A total of eleven residues (denoting high-frequency substitutions) were characterized, with four of them showing possible positive selection.
More specifically, fast-evolving sites were found in the non-structural proteins (i.e., 3CL-protease, polymerase, and helicase), accessory proteins, as well as in the structural proteins N and S. The additional structural analysis placed mutations on the surface of the proteins known to modulate biochemical properties.
"In summary, our study revealed that in the early large genome of SARS-CoV-2, only a few amino acids are exchanged, and hence the selection pressure is low", recapitulate study authors in their bioRxiv paper.
Fine-tuning the adaptive strategy
Taking everything into account, this study has implications for designing both fundamental and clinical experiments to appraise whether any salient properties of SARS-CoV-2 have actually changed during the pandemic
A large number of viral sequences have resulted in an analysis pipeline for tracking potential adaptive mutations in SARS-CoV-2. Whether any of them can modify clinical properties of SARS-CoV-2 (most notably replication rate, transmissibility, cell tropism or pathogenicity) warrants further scrutiny.
And since there are essentially no exchanges in the receptor-binding domain of the spike protein and other epitopes recognized by neutralizing antibodies, there is currently no evidence that SARS-CoV-2 can adapt to improve receptor binding or to escape from the adaptive immune response.
Still, the evolution of SARS-CoV-2 should be followed meticulously during the ongoing pandemic in order to observe whether the virus is fine-tuning its adaptive strategy towards the human cell.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.