Throughout the course of the current coronavirus disease 2019 (COVID-19) pandemic, there have been several waves attributed to variants of the original Wuhan strain of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
The most notable variants are the so-called variants of concern (VOC): Alpha, Beta, Delta, Gamma, and Omicron, due to their increased transmissibility and potential ability to evade immune responses.
The examination of the SARSCoV2 genome sequences has opened up previously unimagined possibilities for tracking the different variants, describing viral genomes, researching molecular and cellular mechanisms, determining the origin of the virus and evolution, and examining a wide range of other elements of the pandemic.
The Global Initiative on Sharing Avian Influenza Data (GISAID) has been a data repository for viral genome sequences from confirmed cases of the infection since the start of the pandemic. More than 10 million sequences have accumulated as of April 2022.
However, analysis of this large dataset is becoming increasingly difficult with the current rate of growth. Therefore, in a recent study posted to the preprint server bioRxiv*, a team of researchers has devised a method for reducing the size of a large collection of genomes by combining correlated single nucleotide variation (SNV) sets (CSSs) with allelic association.
CSSs utilized for dimension reduction
Allelic association with SNVs is observed in the emerging variants of SARS-CoV-2. Due to this preserved feature, allelic association can be utilized to reduce the dimension of the big data and allow for variants to be further divided into subtypes.
The authors combined SNVs with pairwise associations R2 > 0.5 and detected CSSs using an exponential weighted moving average (EWMA), excluding SNV sets with an occurrence frequency of less than 20.
A 29,409 by 2,119,724 genomic sequence matrix was decreased to a 1,366 by 9,848 CSS matrix after a three-stage dimension reduction. The definition of CSSs can vary based on the analytic goal and can contain any subset of the genomic database, such as strains detected throughout time, across nations, or across genome segments. Additionally, the allelic association levels can be adjusted to emphasize certain aspects of interest.
The authors discovered 1,057 CSSs in total with four to 33 SNVs contained in each one, and 1,366 signature SNVs in total. Since March 2020, the D614G SARS-CoV-2 strain has been the most dominant, accounting for 94.38% of all recorded strains, and 1,053 of the 1,057 CSSs identified in this study characterize >99.9% of the D614G SARS-CoV-2 strain.
Transmission enhancer and suppressor SNVs of VOCs
The CSS technique can be extended to single nations to describe and subtype strain variances in greater depth. Because the highly communicable Delta VOC was first detected in India, the authors used the CSS technique to further subtype the Delta strain sequenced in India, resulting in eleven subtypes, of which the first six were the subject of this study. The first three subtypes showed growing temporal trajectories, while the other three had substantially reduced temporal trajectories.
Eight, nine, and eleven signature SNVs with 100% allelic associations identify the first three subtypes with ascending temporal trajectories. These hallmark SNVs were termed "transmission enhancers" by the authors since they are significantly linked to the quick increase in frequency.
In addition to the enhancer SNVs in some strains, the three remaining subtypes, with reduced temporal trajectories, all include a collection of 100% related signature SNVs. These hallmark SNVs seem to have "suppressed" the growth of temporal trajectories.
The CSS technique was also applied to the Alpha VOC, which was first isolated in the UK. The first subtype has a significantly greater temporal trajectory in comparison to the other two identified. These Alpha subtypes were observed in various other regions and followed a similar temporal pattern.
The authors termed the hallmark SNVs of Alpha subtype 01 transmission enhancers because they improved the variant's transmission. Some of these enhancers were found in Alpha subtypes 02 and 03, but they also gained a set of suppressors that seemed to inhibit the temporal trajectory from rising.
In the absence of the transmission enhancers, the fraction of strains belonging to the first subtype was drastically reduced. This finding implies that the enhancers collaborate. The authors failed to locate adequate samples or a sufficient set of suppressors for the Delta subtypes, either. As a result, it's unclear whether the transmission suppressor works alone or in tandem to lower temporal trajectories.
The authors used the same subtyping procedure on the Omicron VOC, but with a considerably larger sample size. The first and second subtypes have a significantly longer temporal trajectory than the third subtype. The temporal trajectory profiles of these Omicron subtypes were consistent. The distinctive SNVs were termed Omicron-01 and Omicron-02 as transmission enhancers since they improved the variant's transmission. Some of these enhancers were found in Omicron-03, but more crucially, they gained a collection of suppressors that seemed to prevent the temporal trajectory from rising.
In the absence of the transmission enhancers, the number of strains belonging to the first two subtypes dropped considerably. This finding implies that the enhancers work in collaboration. There were insufficient samples or evidence of the presence of a partial collection of suppressors, as observed with Alpha and Delta. As a result, it's unclear if the transmission suppressor works alone or in tandem to lower temporal trajectories. However, since the initial occurrence in November 2021, the four suppressor SNVs have emerged together.
Since SARS-CoV-2 has been continuously evolving, there is an unfulfilled need to monitor the temporal variations of viral subtypes, define their characteristic SNVs, and improve the understanding and management of the current pandemic.
When the volume of genomes rises above one million, the analysis becomes difficult. The results from this study show that allelic association is a signature for the emergence and development of a subtype, and hence can be used to identify a strain subtype.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.