Genomic variation and descent of SARS-CoV-2 strains in India

The pandemic of COVID-19 that is still affecting over 188 countries and territories the world over has taken over 359,000 lives in just under five months, while affecting at least 5.8 million people. Different strains have diverged from the original strain detected in Wuhan, China.

New research out of the Indian Council of Medical Research-National Institute of Cholera and Enteric Diseases, and published on the preprint server bioRxiv* in May 2020 provides a comprehensive overview of the 46 Indian SARS-CoV-2 sequenced genomes uploaded to the NCBI or Global Initiative on Sharing All Influenza Data (GISAID) databases from the beginning of 2020, from 10 Indian states.

Novel Coronavirus SARS-CoV-2: This scanning electron microscope image shows SARS-CoV-2 (round gold objects) emerging from the surface of cells cultured in the lab.  Credit: NIAID-RML

Novel Coronavirus SARS-CoV-2: This scanning electron microscope image shows SARS-CoV-2 (round gold objects) emerging from the surface of cells cultured in the lab. SARS-CoV-2, also known as 2019-nCoV, is the virus that causes COVID-19. The virus shown was isolated from a patient in the U.S. Credit: NIAID-RML

Genome Type Clustering

Phylogenetic analysis shows that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is part of the subgenus Sarbecovirus within the genus Betacoronavirus. It is likely to have spread zoonotically from a bat host species, perhaps through an intermediate host, which is possibly the Malayan pangolin, to its final human host.

This RNA virus is undergoing constant mutation, which means that multiple clades have been detected in different populations within a short time. Research on the relationship between various clades, the type of variation between different strains, and how single nucleotide polymorphisms (SNPs) affect viral biology, has helped to understand how the virus jumped species barriers. However, much more remains to be known about this virus.

One way to understand its evolution is to study the genome clusters in a different geographic setting to obtain an overall view of the mixed patterns that can arise in different regions.

The first cases in India were from the southern state of Kerala, in January 2020. These were returnees from or visitors to Wuhan. As of May 29, 2020, over 150,000 cases and more than 4,500 deaths have been reported in India. This reflects a lower case fatality rate than that recorded in the US or Europe, despite the obstacles posed by the relatively underdeveloped healthcare network, the high population density, and inadequate sanitation/hygiene practices.

Understanding Mutations and Phylogenetic Patterns in SARS-Cov-2 Strains

The current study aims to contribute towards understanding the mutations and consequent changes in the viral activity of the strains currently in circulation within India. The researchers retrieved the sequences of the 46 strains from GISAID (39 sequences) or NCBI (7 sequences). They also collected other reference sequences and genomes from other countries, to construct a dendrogram and to carry out lineage analysis.

Due to the inadequate coverage of nucleotide sequences, 7 sequences from Karnataka state in India were subsequently left out of the analysis.

The researchers used data on two structural genes, encoding the spike protein and the nucleocapsid protein, respectively, and 7 non-structural genes. They aligned multiple sequences for each gene and identified the amino acid sequences corresponding to each.

They then generated the phylogenetic dendrograms and tried to map various novel mutations found in the Indian strains to the prototype strain from Wuhan. Finally, they used the CSFFP (Chou and Fasman Secondary Structure Prediction) online server to predict the secondary structure of the RdRP/NSP12, which showed a novel A97V mutation.

While 7 of the 8 subclusters were close to certain clade-specific strains, the last was due to a different set of clade-specific strains. Interestingly, the subcluster b grouped with an isolate from a New York zoo tiger, while the h subcluster grouped with that from a mink.

Concerning the spike gene, the dendrograms showed that all Indian strains clustered to the same betacoronavirus lineage in 8 subclusters, but one was singly extruded. All strains were 99% to 100% homologous with each other but had 93% homology with bat and 84% homology with pangolin CoVs. Other similar CoVs had significantly less homology, and the only distant relationship was observable between the MERS-CoV and the Indian strains.

Similar results were obtained with the nucleocapsid gene dendrograms, but with three subclusters only of which one was found only in the Delhi strains. All clade-specific strains were clustered near subcluster c, as well as the prototype strain O clade from Wuhan.

The dendrogram for the NSP12 gene, encoding one of the non-structural proteins, RNA polymerase, also showed 39 strains clustered into two subclusters, with four others distant from them. Again, all strains were highly similar to each other but distant from the Wuhan strain and from MERS-CoV. There was 98% homology with the bat and 86.7-88.6% similarity with pangolin and other bat SARS-like CoVs.

The other six non-structural proteins had a similar pattern on their dendrograms, with all 46 strains forming a single cluster with 99.9% to 100% homology between themselves. The Wuhan strain was also clustered near them, as well as the Mink and tiger strains. The bat CoV showed 95.4% to 98.1% similarity with this cluster but not so the pangolin CoV. The MERS-CoV was distantly located.

Groups of Indian Isolates

The common mutations found in the Indian strains showed two prominent groups of strains, the major and the minor group, with 24 and 18 strains, respectively. The major group showed 4 SNPs, with strains circulating mostly in Delhi, Telangana, West Bengal, and Gujarat. Among the minor group, 4 had 3 mutations while 14 had the same 3 with another 2 mutations. These strains circulated mostly in Tamil Nadu, Karnataka, Uttar Pradesh, Bihar, and Maharashtra. The mutations in each group co-existed, but they did not overlap between groups.

Unique Mutations in The Indian Isolates

The study showed 3 novel mutations in a total of 16 strains in the major group, and one in 2 minor group strains. The L37F mutation, which is infrequent though not unique to these strains, is strongly indicative of positive selection of the evolution of the betacoronaviruses, with the minor group being a product of such change and later acquiring more mutations.

The NSP6-NSP3-NSP4 interaction is fundamental for the formation of vesicles with a double membrane. Thus more study is needed on the association of the co-existing L37F and T1198K mutations in NSP6 and NSP3, respectively.

The A97V Mutation Alters Protein Structure

The non-structural protein NSP12/RdRP is vital for viral replication and genomic replication accuracy. The study identified two missense mutations in the RdRP protein, where the A97V mutation was found to be associated with the minor group strains. This mutation caused alpha-helices at positions 94-96 to undergo substitution by beta-sheets. This could potentially alter the tertiary structure and thus, the function of the protein.

Single Clade, Multiple Mutations

The study shows that the Indian strains cluster with the prototype Wuhan strain to form a monophyletic clade, but accumulated several SNPs. These point mutations drive the formation of multiple monophyletic clades. However, the origin is probably from China.

The Indian strains cluster with those from other countries, suggesting their introduction into India from multiple foreign sources.

Bat or Pangolin

The search for the intermediate host of this virus is essential to prevent further dissemination of the virus and more species jumps. Thus, the phylogenetic examination of each of the genes of the circulating strains shows genome clustering for genes S, N, and RdRP/NSP12. This is confirmed by the clustering of the isolates with various clade-specific strains depending on the gene, showing that clustering is occurring with the type of genome.

The homology at the level of each gene shows variations. However, the dendrograms show that “a very recent bifurcation of these Indian strains from the bat and Malayan pangolin derived SARS-like coronaviruses is supposed to have occurred, with a subsequent zoonotic transmission 224 to humans.” On the other hand, pangolins and bats could be the intermediate hosts, as shown by earlier phylogenetic studies.

The study also shows that the RBD within the S1 subunit of the spike gene in every Indian strain and pangolin-derived strain is highly similar, and closer to each other on the dendrogram than with the bat virus. There were two haplotypes, with one composing 95.7% of the strains.

The Implications

The scientists conclude, “We can assume that the mutations: 14408C>T (P323L) 281 and 13730C>T (A97V), which were found to have significant influence on the secondary structure of RdRP, could play key roles in the simultaneous establishment of “two groups” of SARS-CoV-2 with characteristic “co-evolving mutations” in India.”

This must be experimentally verified by more research. The difference in virulence and disease-causing ability between the two groups of isolates was not studied since there was no data on the clinical features of the patients who donated their samples.

*Important Notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Dr. Liji Thomas

Written by

Dr. Liji Thomas

Dr. Liji Thomas is an OB-GYN, who graduated from the Government Medical College, University of Calicut, Kerala, in 2001. Liji practiced as a full-time consultant in obstetrics/gynecology in a private hospital for a few years following her graduation. She has counseled hundreds of patients facing issues from pregnancy-related problems and infertility, and has been in charge of over 2,000 deliveries, striving always to achieve a normal delivery rather than operative.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Thomas, Liji. (2020, May 28). Genomic variation and descent of SARS-CoV-2 strains in India. News-Medical. Retrieved on July 12, 2020 from

  • MLA

    Thomas, Liji. "Genomic variation and descent of SARS-CoV-2 strains in India". News-Medical. 12 July 2020. <>.

  • Chicago

    Thomas, Liji. "Genomic variation and descent of SARS-CoV-2 strains in India". News-Medical. (accessed July 12, 2020).

  • Harvard

    Thomas, Liji. 2020. Genomic variation and descent of SARS-CoV-2 strains in India. News-Medical, viewed 12 July 2020,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
You might also like... ×
Tracing adaptation to ACE2 use in sarbecoviruses