Scientists have found that several viruses which belong to the Coronaviridae family can infect a wide range of hosts, including birds, humans, and other mammals. These viruses are single-stranded, positive-sense, RNA viruses whose size ranges between 27-32 kb. They are divided into four categories, namely, alpha, beta, delta, and gamma.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causal agent of the ongoing coronavirus disease 2019 (COVID-19) pandemic, and was first identified in the Wuhan province of China in December 2019. Owing to its high infectivity and mortality rate, the World Health Organization announced COVID-19 to be a pandemic on 11th March 2020.
As viruses undergo genomic mutation, it is foremost important to identify the mutating site for vaccine development. Several phylogenetic tree-based analyses have been conducted to understand the evolutionary relationship of SARS-CoV-2 with other beta coronaviruses. A previous study constructed a phylogenetic tree and revealed that the genomic sequence of SARS-CoV-2 is 88% identical with BAT-CoV. In another study, scientists have isolated around 70 SARS-CoV-2 genomic sequences from COVID-19 patients and studied the spike glycoprotein gene. This study also reported that the BetaCoV-bat-Yunnan-RaTG13-2013 virus is almost identical to SARS-CoV-2.
Even though a comparative study on the genomic sequences of SARS-CoV, MERS-CoV, and SARS-CoV-2 is available, there is a gap in the research in regards to the comparison between four types of coronaviruses, namely, SARS-CoV, MERS-CoV, BAT-CoV, and SARS-CoV-2. A new study, which deals with the genomic comparison between the sequence of the above-stated four types of coronaviruses, has been published in the Journal of Medical Virology. This study utilized multiple genetic markers, including the single nucleotide polymorphisms (SNPs), whole-genome sequence phylogeny, mutations in proteins, and microsatellites. These were compared with the SARS-CoV-2 reference genomic sequence that is known as the Wuhan strain (Wuhan-Wu-I). All the sequences were obtained from the NCBI Genbank.
The SARS-CoV, MERS-CoV, and SARS-CoV-2 sequences were obtained from Homo sapiens (host), while the BAT-CoV sequences were collected from eight different types of bats. The results of this study are described below.
For phylogenetic analysis of the different coronaviruses sequence, a maximum likelihood approach with 1000 bootstrapped values was used. The phylogenetic analysis revealed different lineages of coronaviruses. The whole genome-based phylogenetic analysis has shown MERS-CoV to belong to outgroup species while the other three were classified as ingroup species. Within the ingroup, two lineages were found, namely, a lineage consisting of SARS-CoV-2 and another consisting of SARS-CoV and BAT-CoV. The branches of the phylogenetic tree indicated that the SARS-CoV had diverged very early from the BAT-CoV. The tree also revealed an independent divergence of SARS-CoV-2 from the BAT-CoV. The phylogeny also showed SARS-CoV-2 to be more closely related with BAT-CoV and SARS-CoV than the MERS-CoV. The Simplot software was used to visualize the similarity plot between the four selected species. It revealed around 98% homology of BAT-CoV with the reference sequence, i.e., the Wuhan stain of SARS-CoV-2. However, 92% similarity was obtained between SARS-CoV and reference sequence, and 58% similarity between MERS-CoV and the Wuhan strain.
Analysis of genetic variants
A variant-based analysis showed that the MERS-CoV genome differed from the Wuhan reference strain at 134.21 sites, the BAT-CoV genome differed at 136.72 sites, the SARS-CoV genome differed at 26.64 sites, and the SARS-CoV-2 genome differed at 0.66 sites. Additionally, the current study also revealed that the probability of mutations at the missense sites of MERS-CoV and SARS-CoV-2 is higher compared to SARS-CoV and BAT-CoV. This is due to the reduced number of missense variations in SARS-CoV and BAT-CoV, which has occurred owing to selection pressure on missense sites.
The number of mutations at the Spike protein (S), Envelope protein (E), Membrane protein (M), Nucleocapsid protein (N), and structural proteins were calculated. The SNPs were filtered from S, M, E, and N gene regions by a python script. The S, M, E, and N genes revealed the presence of a varied number of SNPs. The Multialin online tool was used to detect the similarities between four coronaviruses selected for the current study.
Microsatellite analysis is used to determine the repetitive sequences in the genome. These sequences have a significant impact on the onset of the diseases and their evolution. In this study, microsatellite analysis was performed using IMEX (Imperfect Microsatellite Extractor) and FMSD (Fast Microsatellite Discovery) online tools. No significant presence of microsatellite was found using IMEX. However, FMSD revealed the presence of more microsatellite in MERS-CoV. The SARS-CoV-2 genome showed the presence of the largest incidence of compound microsatellites.
In summary, the phylogenetic tree analysis showed SARS-CoV-2 is closely related to BAT-CoV, and its second-closest relative is SARS-CoV. All MERS-CoV strains showed distal relation to SARS-CoV-2. In the analysis of genetic variants, more mutations were found in MERS-CoV compared to SARS-CoV and BAT-CoV. The phylogenetic analysis, study of genetic variation, multisequence, and microsatellite analysis, showed that the bat is the native host of SARS-CoV-2. Additionally, it also concluded that BAT-CoV is closely related to SARS-CoV-2. There is a possibility of the presence of an intermediate host to initiate the transmission of COVID-19 from BAT to humans. However, more research is required to validate this assumption. The FMSD tool revealed that SARS-CoV is more closely associated with SARS-CoV-2 than BAT-CoV.
- Rehman, A. H. et al. (2021). Comprehensive Comparative Genomic and Microsatellite Analysis of SARS, MERS, BAT‐SARS and COVID‐19 Coronaviruses. Journal of Medical Virology, https://doi.org/10.1002/jmv.26974, https://onlinelibrary.wiley.com/doi/10.1002/jmv.26974