An international team of researchers has identified a previously uncharacterized gene within the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome that may be important in understanding the origins and evolution of coronavirus disease 2019 (COVID-19).
The newly identified gene, which the researchers have named ORF3c, is an example of an overlapping gene (OLG). According to the authors, OLGs are functional genomic elements that have been somewhat overlooked in the past, despite them having previously been indicated in the origins of viral pandemics.
The researchers say their analysis demonstrating that SARS-CoV-2 contains an OLG that has not yet been properly studied shows that OLGs deserve more attention, particularly as their contribution to the emergence of zoonotic viruses may be more significant than is currently realized.
A pre-print version of the paper can be accessed in the server bioRxiv*, while the article undergoes peer review.
Novel Coronavirus SARS-CoV-2 Colorized scanning electron micrograph of an apoptotic cell (blue) infected with SARS-COV-2 virus particles (red), isolated from a patient sample. Image captured at the NIAID Integrated Research Facility (IRF) in Fort Detrick, Maryland. Credit: NIAID
OLGS represent an under-investigated area of novelty
The COVID-19 outbreak that has swept the globe since its outbreak in Wuhan, China in December 2019, has led to many questions about how the virus has evolved to jump from animals to humans.
Understanding the potential mechanisms requires a thorough understanding of viral genomes, but one under-investigated area is the emergence of novel OLGs. Although viral OLGs are common and have previously been linked to the start of pandemics, they remain largely overlooked in studies of emerging pathogens.
These genes enable an existing stretch of protein-coding nucleotides to encode an additional new protein in a different reading frame. This “overprinting” improves the compression of genomic information and may confer a significant genetic advantage, as frameshifted sequences preserve certain properties of proteins.
However, sequence analysis of OLGs is complicated by the fact that one mutation may result in two proteins being altered. Genome annotation techniques also tend to miss OLGs because they favor one open reading frame within a given region of the genome.
“Such inconsistencies stymie research”
In the case of SARS-related Betacoronaviruses (subgenus Sarbecovirus), researchers are aware of certain OLGs, but they have been largely overlooked and inconsistently reported. For example, annotations of ORF3b, ORF9b, and ORF9c are conflicting or missing altogether in the Wuhan-Hu-1 SARS-CoV-2 reference genome, and no overlapping genes within ORF3a are shown in the University of California Santa Cruz (UCSC) SARS-CoV-2 genome browser.
“Such inconsistencies stymie research, as OLGs may play a key role in the emergence of new viruses,” write Xinzhu Wei (University of California, Berkeley) and colleagues.
What has the new study found?
Now, the researchers report that their investigation of new OLG candidates in SARS-CoV-2 has identified the new ORF3c gene.
This gene has, in fact, been documented previously, they say. However, it has been unnamed, and annotations have either been missing or combined with ORF3b of other sarbecoviruses, which has a different genomic position and is located in a different reading frame.
Ribosome profiling showed that ORF3c clearly demonstrates evidence of translation and possesses important immunological properties.
The team then conducted an evolutionary analysis at the between-species, between-host, and within-host levels. They found that 21 representative sarbecovirus genomes showed ORF3c also existed in certain pangolin coronaviruses, but not in bat coronaviruses.
Analysis of almost 4,000 SARS-CoV-2 genomes showed that ORF3c had gained a new stop codon that has occurred significantly more frequently during the current COVID-19 pandemic. Furthermore, the sequencing of 401 SARS-CoV-2 samples revealed that the ORF3c mutation was present in various hosts.
Surprisingly, the new stop codon the gene had gained “hitchhiked” early with a SARS-CoV-2 spike protein haplotype, which the authors say “appears to drive the European pandemic spread.”
Between-species sliding window of genes overlapping N. Pairwise OLGenie analysis of the N gene across sarbecoviruses, in the ss13 reading frame. Each genome was compared with SARS-CoV-2 (left hand side) and SARS-CoV (right hand side plot).
What are the study implications?
The team says their findings provide strong evidence that SARS-CoV-2 has a third OLG that has not been properly identified or analyzed until now.
“Our results liken ORF3c to other important viral accessory genes recombined, lost, split, or truncated before or during outbreaks, including ORF3b and ORF8 in sarbecoviruses,” write the researchers.
“OLGs deserve considerably more attention, as their rapid evolution may be more important than is currently appreciated in the emergence of zoonotic viruses,” they conclude.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.