SARS-CoV-2, since its outbreak in December 2019 in Wuhan, Hubei province, China, has quickly begun to mutate. Mutations are common in coronaviruses, and SARS-CoV-2 has been found to have many different clades.
Image Credit: Andrii Vodolazhskyi/Shutterstock.com
These clades can give scientists information on where certain strains of the virus are concentrated, and how these different clades may impact the virulence of SARS-CoV-2, the speed of the disease spread, and its resistance to antiviral medications.
What is a clade?
A clade is a term for a group of organisms that all originate from a common ancestor, and is widely used in biology. Using phylogeny, which is the evolutionary history of a group of organisms, the development of changes in a set of descendant organisms can be tracked.
In virology, a clade describes groups of similar viruses based on their genetic sequences, and changes in those viruses can also be tracked using phylogeny. Rapid genome sequencing is the method by which developments in a virus’s genomic makeup can be tracked.
SARS-CoV-2 is itself a clade within the family coronaviridae and the genus betacoronavirus. Generally, the genetic variations of a virus are grouped into clades, which can also be called subtypes, genotypes, or groups.
Rapid genome sequencing can help to quickly work out where a person has become infected with a certain clade of SARS-CoV-2. For instance, the first four cases of COVID-19 in New South Wales, Australia, were found to be closely related to the dominant strain of SARS-CoV-2 found in Wuhan, and these first four cases were all in people who had recently returned from traveling in China. This meant that travel could then be restricted between China and Australia to limit the numbers of infected people traveling to and from the two countries.
As discussed in a Virus Evolution paper, published in April 2020, cases were also tracked from Australia to Iran, where it was found that the genomes were all from one monophyletic group characterized by three nucleotide substitutions in the SARS-CoV-2 genome, when compared to the prototype strain from Wuhan.
What are the different SARS-CoV-2 clades?
As of August 11, 2020, there were over 52,600 complete and high-coverage genomes available on the Global Initiative on Sharing Avian Influenza Data (GISAID).
A study published in the International Journal of Infectious Diseases (IJID) on 22 August 2020, found that there were five clades of SARS-CoV-2 that were characterized by 11 major mutations worldwide. There was an increased dominance of one or two clades in each geographic location included in the study.
The five clades were:
Clade G614 was most widely spread in Europe and North America after being brought into the continents by people traveling from China, and the dominance of clade G614 could be due to the increased longevity of the virus that this particular mutation causes.
The majority of the genomes that have not been categorized in a major clade by the study are found in Asia and were detected early in the pandemic.
However, it is important to note that this study was limited by the range of genomic data available, which only came from certain regions. As a result, new clades may become apparent in the future as more geographical regions make genomic data available.
A different investigation carried out by the World Health Organization (WHO) showed how the SARS-CoV-2 genome has evolved as it has spread across the world. This investigation did not show how these evolutions changed the virulence of the virus, but it did show that the most common SARS-CoV-2 clade was the D614G variant within the six clades and 14 subclades it identified.
The investigation included 10,022 SARS-CoV-2 genomes from 68 different countries. In total, WHO detected 65,776 variants and 5,775 distinct variants, which comprised:
- 2,969 missense mutations
- 1,965 synonymous mutations
- 484 mutations in non-coding regions
- 142 non-coding deletions
- 100 in-frame deletions
- 66 non-coding insertions
- 36 stop-gained variants
- 11 frameshift deletions
- Two in-frame insertions.
The D614G variant is located in the B-cell epitope and has been found to have a very immunodominant region, which may affect how well a vaccine may work against it.
The largest clade found in the WHO study was D614G, which has five subclades associated with it. The non-coding variant 241C > T, along with 3037C > T, and ORF1ab P4715L were found in most of the samples in the D614G clade.
Additionally, almost every strain that had a D614G mutation featured mutations in proteins that control viral replication, which has implications for how quickly the virus can multiply. This particular protein is what anti-viral drugs remdesivir and favipiravir target. It could be possible that strains of SARS-CoV-2 quickly become resistant to drugs that target these proteins.
The second-largest clade identified by the WHO study was L84S, which comprises two subclades. L84S was a clade found in people traveling from Wuhan in the early phases of the SARS-CoV-2 outbreak.
The August study in the IJID stated that while genome replication is occurring in the host, SARS-CoV-2 undergoes genome mutations that can be passed on to descendent genomes and new hosts as it is spread between people.
Where are the Different Clades Located in the World?
The SARS-CoV-2 pandemic has affected 188 countries in every continent except Antarctica.
The A2a clade, which was introduced into New York through Europe and Italy, is concentrated on the East Coast of the USA.
B1 clade predominates on the West Coast of the USA.
G614 clade has become widespread globally, while at the early stages of the pandemic, S84 was the predominant clade in Asia when unassigned genomes are excluded.
Information in the WHO investigation on the locations in which certain clades and subclades are common includes:
- L84S/p5828L/ subclade in the USA
- G251V in the UK, Australia, USA, and Iceland.
SARS-CoV-2 has proven to be a genetically diverse virus that has now become endemic within humans. Identifying and tracking clades that become dominant in certain geographical regions may inform the development of effective vaccination, as certain anti-viral drugs may not work against mutated forms of the virus, or clades of the virus that have developed resistance through genomic mutation.