In a pre-print study posted to Research Square* and currently under review at Scientific Reports, researchers classified and grouped severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants based on their cellular automata imaging (CAI) images and Hamming distances, to determine viral evolutions.
SARS-CoV-2 contains single-stranded ribonucleic acid (RNA) and four proteins - Envelope (E), Nucleocapsid (N), Matrix (M), and the Spike (S) proteins - in its molecular structure. Mutations in the viral S glycoproteins have led to the alarmingly rapid emergence of genetically modified variants with enhanced viral transmission, infectivity, and intracellular replication. Thus, the genetic evaluation of the S protein is pertinent for formulating improved vaccines and therapeutic drugs.
Although previous studies have investigated the S protein structure, the methods used such as similarity analysis, alignment methods, and image processing was very complicated. Hence the authors of this study used CAI, a simpler, economic and effective technique using discrete digital codes and easily comprehensible evolutionary rules to assess complicated protein structures.
About the study
In the present study, the authors assessed the viral S glycoprotein of SARS-CoV-2 variants of concern (VoCs) - the initial Wuhan strain and its mutated variants: Alpha, Beta, Gamma, Delta, Omicron, P2, and B.1.1.28 - using a combinational approach of CAI images of the S protein sequences along with the Hamming distances (DH) metric to evaluate variant similarity and differences and to determine viral evolutions. This method enabled the classification and clustering together of protein sequences with similar ancestry and location and differentiating them from other proteins present in genetic databases such as UniProt and GenBank.
CAI is composed of four components: the neighboring cells, a grid, local transition evolutionary rule, and states as one or zero. The evolutionary rules involve the evaluation of the neighboring cells since these cells contain amino acids (aa) that affect protein function and folding. To genetically encode 1,273 aa sequences of the S protein, digital coding inclusive of five- and eight-digit codes for each constitutive aa and codes reflective of physicochemical properties based on complementarity, similarity, information theory, and molecular recognition theory were used.
In this study, the genetic coding of the variant sequences was based on each aa hydrophobicity, matched with the codes of the entire protein containing 6,365 cells, with eight states of neighboring cells, and the set of states as one or zero to create a one-dimensional CAI image. Using CAI, 25,635 evolutions were possible. Wolfram’s rule was used for classifying the VoCs and differentiating them from other viral sequences.
Results and discussion
The S protein of the SARS-CoV family demonstrated a characteristic V-shaped pattern in all CAI images with differences in the image of each variant based on the type and number of genetic mutations. These visual differences in the CAI images denoted the evolution of each mutated variant. The CAI images were further classified as Wolfram Class IV exhibiting a behavior between periodical (Class II) and chaotic (Class III) types.
The Omicron variant had the highest number of mutations, indicated by the highest DH values, with 33 aa substitutions in its viral S glycoprotein and the presence of NF01Y mutation. These numerous modifications were responsible for increased viral transmission and decreased viral effectiveness. The Delta variant carrying the P681R mutation was most closely located from the Wuhan strain with the least number of mutations, as indicated by the smallest DH values.
The present study suggests that the genetically modified SARS-CoV-2 variants with similar mutations and ancestors can be grouped together based on the DH from the initial Wuhan strain calculated on the CAI images to develop phylogenetic and evolutionary relationships between the SARS-CoV-2 variants.
The study findings also highlight the genetic deviation of the most mutated Omicron variant with the highest molecular degeneracy and genetic variability due to aa substitutions at site 501, compared to the other VoCs. According to the present study, the rapid convergence of aa could lead to the concomitant emergence of the Alpha, Beta, and Gamma genetic variants in three different continents simultaneously.
The team of researchers also pointed out that mutations in the protein sequences of SARS-CoV-2 variants lead to genetic degeneracy and structural variability. Higher degeneracy has been associated with increased viral transmission and contributed to the rapid spread of the COVID-19 pandemic across the globe.
Research square publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.