Researchers update automated computation tool for SARS-CoV-2 genome analysis

Researchers updated a previous version of an automated tool to include severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome analysis. Genome analysis using the freely available software could help track the evolution of the virus and promptly identify variants that increase viral transmission or virulence.

Study: Evolving Insights from SARS-CoV-2 Genome from 200K COVID-19 Patients. Image Credit: Limbitech / Shutterstock
Study: Evolving Insights from SARS-CoV-2 Genome from 200K COVID-19 Patients. Image Credit: Limbitech / Shutterstock

The SARS-CoV-2 pathogen has been mutating since it was first discovered in late 2019. Some of these mutations have increased virus fitness, which may affect COVID-19 disease outcomes, its transmissibility, and, subsequently, could impact the efficacy of current vaccines. Thus, thorough and continual sequencing of as many genomes as possible across the world will be crucial in keeping on top of the pandemic.

There are some national initiatives for active SARS-CoV-2 genome surveillance, such as COVID-19 Genomics UK Consortium in the United Kingdom and the Indian SARS-CoV-2 Genomics Consortium, which are tasked with identifying new variants. Any increase in cases because of a new variant will require immediate action to contain the spread. This will also require automated methods to analyze and identify new strains.

In a paper published in the bioRxiv* preprint server, researchers report the second generation of a computation tool, Infectious Pathogen Detector (IPD), to determine the abundance and mutation of SARS-CoV-2, which has an expanded variant database and revised clade assessment.

Finding frequency of mutations

The authors analyzed 200,865 SARS-CoV-2 genome sequences from 155 countries, which had 2.58 million mutations as of 28 December 2020 compared to the reference Wuhan strain. They found about 39% synonymous mutations, mutations that are usually minor and do not change the amino acids. About 51% were non-synonymous mutations, which are mutations that change the amino acids. About 9% of the mutations were in the intergenic region with the coding 5’ and 3’ UTRs. Among the non-synonymous mutations, about half were missense mutations, or mutations in a single nucleotide.

The researchers noted 13 hotspot residues that occurred in more than 40,000 samples. The most frequent synonymous mutation occurred 186,189 times in the NSP3 gene followed by a mutation in the RNA-dependent RNA polymerase gene 185,945 times. The non-synonymous mutations D614G and A222V occur 176,436 and 47,971 times, respectively, in the spike protein S gene. The next frequent mutation is a 2-amino acid change R/G203K/R. A220V mutation in the N gene occurs 48,426 times, the third most frequent mutation.

The D614G mutation causes higher viral loads in the respiratory tract but does not alter disease severity. The team did not find a significant frequency of the other spike protein mutations N439K, S477Y, E484K, and N501Y. The 13 most frequent mutations comprise five synonymous mutations that likely affect mRNA splicing or selection on codon usage bias, stability and folding translation or co-translational protein folding.

Upon further analyzing the data, the team found that the S, N, M, ORF7a, and ORF10 genes, about 21% of the genome, account for 54.36% of all the SARS-CoV-2 nonsynonymous mutations. The S and M genes have the smallest proportion of total variable bases in the virus genome, suggesting a strong positive selection of nonsynonymous mutations in these genes.

Among the other new variants of the SARS-CoV-2 virus, the B1.1.7 mutant from the United Kingdom had 32 mutations, the B.1.351 mutation from South Africa had 25 mutations, and the Brazilian P.1 variant had 25 mutations.

Tool for genome surveillance

Upon comparing the variants predominant in the three new strains, along with those from India, the authors found four common hotspot mutations that included D514G. N501Y was the base mutation in all the three variants, with the South African and Brazilian strains showing additional E484K mutation in the spike protein.

Neither of these two mutations were seen in the Indian samples, and only two out of 3,361 Indian samples showed the S477N mutation. It is unknown if the absence of these mutations, which have increased binding affinity to the human angiotensin-converting enzyme 2 (ACE2) receptor, could account for the lower transmission in India compared to the UK, Brazil, and South Africa.

Clade analysis revealed 20E, 20B, and 20A to be the most dominant. All the analysis resulting in variant and clade information was included in the database for the second generation of IPD. The team found that IPD 2.0 assigned clades with high accuracy when tested using simulated sequence dataset generated from the genomes of different clades.

The database with updated variants and clade assessment module enables quantification and phylogenetic assessment of the SARS-CoV-2 genome. The authors write, “This makes IPD 2.0 a pertinent tool for analysis of diverse SARS-CoV-2 sequence datasets and facilitate genomic surveillance to identify variants involved in breakthrough infections.”

*Important Notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Lakshmi Supriya

Written by

Lakshmi Supriya

Lakshmi Supriya got her BSc in Industrial Chemistry from IIT Kharagpur (India) and a Ph.D. in Polymer Science and Engineering from Virginia Tech (USA).


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Supriya, Lakshmi. (2021, January 22). Researchers update automated computation tool for SARS-CoV-2 genome analysis. News-Medical. Retrieved on June 21, 2021 from

  • MLA

    Supriya, Lakshmi. "Researchers update automated computation tool for SARS-CoV-2 genome analysis". News-Medical. 21 June 2021. <>.

  • Chicago

    Supriya, Lakshmi. "Researchers update automated computation tool for SARS-CoV-2 genome analysis". News-Medical. (accessed June 21, 2021).

  • Harvard

    Supriya, Lakshmi. 2021. Researchers update automated computation tool for SARS-CoV-2 genome analysis. News-Medical, viewed 21 June 2021,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
You might also like... ×
SARS-CoV-2 infection detected in a poodle living with a COVID-19 positive family