Comparing genomes to understand how mutations affect the COVID-19 pandemic

Download PDF Copy

Revised

By Dr. Liji Thomas, MDJun 4 2020

As the COVID-19 pandemic circulates the world, scientists are still trying to understand the complexities of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19 disease. However, there is a long way to go to understand its genomic content. Now, a new study by researchers at the Massachusetts Institute of Technology and the Center for Computational Biology, Flatiron Institute and published online on the bioRxiv* preprint server describes how the use of comparative genomics helps to identify protein-coding and non-coding functional genes.

Novel Coronavirus SARS-CoV-2 This scanning electron microscope image shows SARS-CoV-2 (yellow)—also known as 2019-nCoV, the virus that causes COVID-19—isolated from a patient in the U.S., emerging from the surface of cells (pink) cultured in the lab. Image captured and colorized at NIAID's Rocky Mountain Laboratories (RML) in Hamilton, Montana. Credit: NIAID

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

Reading the Viral Genome

Over two-thirds of the genome of the SARS-CoV-2 virus comprises a large open-reading frame called ORF1ab with some sequences that are conserved among coronaviruses. This segment is translated to a large protein precursor that is then split into several non-structural proteins (nsp) nsp1-nsp10 and nsp12-nsp16.

This segment contains a frameshift for translation, the failure of which, in ORF1a, causes the termination of the translation four codons later. This is then translated into a different protein that is cleaved into nsp1-nsp11. ORF1 encodes several mature proteins, including RNA-dependent RNA polymerase (Pol), a helicase (Hel), and proteins required for transcription, cleavage, viral assembly. It prevents the host cell response as well as immunosuppression.

Subgenomic Transcripts

Viral RNA is translated in the human host cell using human translation machinery, which transcribes the first ORF. But to get at the genes in the remaining one-third of the genome, the process is more complicated. The virus first generates an RNA-dependent positive-to-negative subgenomic transcript from the 3’ end to a transcription-regulatory sequence or TRS, and then from the 5’ end. This is followed by RNA-dependent negative-to-positive transcription as a second step.

Genomic Annotation – What is Known

To understand how an organism functions, it is important to annotate the genome correctly for protein-coding segments. This will help predict how variants affect the phenotype by showing how they change the amino acid sequence first of all.

This last third of the genome contains genes for the spike protein, envelope protein, and membrane proteins, on ORF2, ORF4, and ORF5, respectively. These drive viral assembly. The nucleocapsid protein then packages the viral RNA.

The rest of the ORFs are unknown, and their annotation is chiefly on the basis of gene homology and algorithms, leading to considerable disagreement as to which gene encodes functional proteins. Experimental techniques to clearly identify which genomic locations transcribe specific genes, and the protein products associated with them, are desperately needed to understand the virus better.

Over 1800 mutations and gene variants have been identified in the current pandemic, but it is not clear which of them are functional.

How the Study Was Done

The current study aims to address these three challenges using comparative genomics to conduct a systematic analysis. This will help identify those of the still unknown ORSs which encode functional proteins and find those genetic variants with functional and therapeutic importance.

The study included 44 complete genomes from closely-related coronaviruses, which were then aligned on a genome-wide basis to include all the known genes and putative ORFs. This helped the researchers to classify the 1,800 unknown single nucleotide variants (SNVs) into those which are probably benign vs. those that will be harmful to conserved gene functions.

The researchers found that ORFs 3a, 6, 7a, 7b, and 8 are conserved functional regions that code for protein. ORF 10 is non-coding for protein but probably subserve important functions nevertheless. ORF 14 is probably non-coding for functional proteins.

Functional and Medical Significance of the Findings

One important finding was that many of the variants in the spike protein gene that have come into recent existence, as the virus spread more widely, disrupt perfectly conserved amino acids. Several of these variants have been identified as possibly promoting increased transmission or increased viral load. The researchers hypothesize that this could be how the virus adapted to the human host.

The identification of a region in the nucleocapsid protein, with 20 amino acids, that displays many variants for conserved amino acids throughout the sarbecovirus clade. These variants could help understand how the virus has adapted to the human host.

The study exposed some limitations of current experimental approaches, which may capture only the currently existing transcripts but not the time-related pattern of changes in the genome due to exposure to a variety of hosts in the past. These techniques, though used here to classify SNVs only, should be useful for other types of variants as well, to clarify the genotype-phenotype linkages.

Finally, the researchers call for further work to identify the functions of still-unnamed genes and the effects of different variants. They say this might “lead to the identification of weaknesses of the virus.” They conclude: “These comparative genomics annotations provide a general resource for prioritizing functional variants and strains, for vaccine development and specialization, and for untangling the molecular biology of SARS-CoV-2.”

Journal references:

Preliminary scientific report. Jungreis, I. et al. (2020). Sarbecovirus Comparative Genomics Elucidates Gene Content Of SARS-Cov-2 And Functional Impact Of COVID-19 Pandemic Mutations. bioRxiv preprint. doi: https://doi.org/10.1101/2020.06.02.130955. https://www.biorxiv.org/content/10.1101/2020.06.02.130955v1
Peer reviewed and published scientific report. Jungreis, Irwin, Rachel Sealfon, and Manolis Kellis. 2021. “SARS-CoV-2 Gene Content and COVID-19 Mutation Impact by Comparing 44 Sarbecovirus Genomes.” Nature Communications 12 (1): 2642. https://doi.org/10.1038/s41467-021-22905-7. https://www.nature.com/articles/s41467-021-22905-7.

Article Revisions

Mar 21 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.

Posted in: Medical Research News | Disease/Infection News

Comments (0)

Written by

Dr. Liji Thomas

Dr. Liji Thomas is an OB-GYN, who graduated from the Government Medical College, University of Calicut, Kerala, in 2001. Liji practiced as a full-time consultant in obstetrics/gynecology in a private hospital for a few years following her graduation. She has counseled hundreds of patients facing issues from pregnancy-related problems and infertility, and has been in charge of over 2,000 deliveries, striving always to achieve a normal delivery rather than operative.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Thomas, Liji. (2023, March 21). Comparing genomes to understand how mutations affect the COVID-19 pandemic. News-Medical. Retrieved on February 09, 2026 from https://www.news-medical.net/news/20200604/Comparing-genomes-to-understand-how-mutations-affect-the-COVID-19-pandemic.aspx.
MLA
Thomas, Liji. "Comparing genomes to understand how mutations affect the COVID-19 pandemic". News-Medical. 09 February 2026. <https://www.news-medical.net/news/20200604/Comparing-genomes-to-understand-how-mutations-affect-the-COVID-19-pandemic.aspx>.
Chicago
Thomas, Liji. "Comparing genomes to understand how mutations affect the COVID-19 pandemic". News-Medical. https://www.news-medical.net/news/20200604/Comparing-genomes-to-understand-how-mutations-affect-the-COVID-19-pandemic.aspx. (accessed February 09, 2026).
Harvard
Thomas, Liji. 2023. Comparing genomes to understand how mutations affect the COVID-19 pandemic. News-Medical, viewed 09 February 2026, https://www.news-medical.net/news/20200604/Comparing-genomes-to-understand-how-mutations-affect-the-COVID-19-pandemic.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.