As of late 2021, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants associated with increased transmissibility and/or immune evasion (antibody escape) had nearly completely supplanted the original founder strain (Wu-Hu-1). Emerging variants frequently have at least one mutation in the receptor-binding domain (RBD), which can affect binding to angiotensin-converting enzyme 2 (ACE2). For example, alpha (B.1.1.7), beta, and gamma variants have the N501Y mutation, which is associated with higher affinity binding to ACE2, implying that this could be a selective pressure for variant emergence.
Study: Predictive profiling of SARS-CoV-2 variants by deep mutational learning. Image Credit: Lightspring/Shutterstock
Previous investigations used yeast surface display and deep mutational scanning (DMS) to examine the impact of single-position mutations on binding to ACE2 and monoclonal or serum antibodies on the complete 201 amino acid RBD of SARS-CoV-2. Several widely circulating variants (e.g., beta, gamma, and delta) as well as newly developing variants (e.g., mu (B.1621) and lambda (C.37) have numerous mutations in the RBD, which are associated with improved ACE2 binding and/or multi-class antibody escape.
The recent emergence of the omicron variant with 15 RBD mutations, which poses a significant danger of immune evasion, highlights the urgent need to understand the impact of combinatorial mutations. However, as the number of mutations and amino acid diversity increase, combinatorial sequence space expands exponentially, rapidly exceeding the capabilities of experimental screening procedures. For example, theoretical sequence space greatly exceeds what can be screened by yeast display libraries while focused only on a subset of twenty RBD residues directly involved in ACE2 binding.
Deep mutational learning (DML) is a technique developed by researchers from multiple institutions that combines experimental yeast display screening of RBD mutagenesis libraries with deep sequencing and machine learning. DML allows for a complete analysis of combinatorial RBD mutations and their impact on ACE2 binding and antibody escape, allowing for SARS-CoV-2 variant predictive profiling.
A preprint version of the study is available on the bioRxiv* server while the article undergoes peer review.
The authors examined their classification performance on defined variants, followed by experimental validation and structural modeling, after establishing that ACE2 binding and antibody escape machine learning models can produce highly accurate predictions on test data. To replicate realistic evolutionary routes, synthetic lineages were created in silico, with variants lacking anticipated ACE2-binding intermediates at each mutational stage being discarded. The lineages were created to contain mutations from the original Wu-Hu-1 RBD sequence at edit distance 3 (ED3), ED5, and ED7 (nucleotide and amino acid). The sequences were also chosen to establish lineages with mutations found in circulating variations.
A consensus model was used to predict ACE2 binding, in which a given RBD sequence is projected to bind ACE2 if both the RF and RNN models provide P > 0.5; otherwise, they are anticipated to be non-binders. The 46 synthetic lineage variants were chosen for their ACE2 binding prediction variety (36 predicted binders, ten predicted non-binders). Additionally, predictions for escape from each of the four therapeutic antibodies were established using a similar consensus model technique for the synthetic variations (RBD sequence escapes an antibody when both RF and RNN outputs are P 0.5).
Each synthetic RBD variation was independently produced on the surface of yeast cells and tested for ACE2 binding and antibody escape after all machine learning predictions were completed. The consensus model accurately predicted ACE2 binding for 91.67 % of the synthetic variations, with a non-binding prediction accuracy of 100 %, yielding a prediction accuracy of 93.48 % overall. The cumulative accuracy of antibody escape predictions across all four therapeutic antibodies was 93.94 % for the 33 correctly predicted ACE2-binding variants.
In addition, consensus models predicted ACE2 binding and escape from all four therapeutic antibodies in three variations that were just ED3 (nucleotide and amino acid) from the Wu-Hu-1 RBD. Mutations were found in one of these variations at locations 493, 498, and 501, which are all mutated in the omicron variant. Following yeast display studies, the machine learning predictions of antibody escape from all four therapeutic antibodies, including the often mutation resistant REGN10987, were confirmed. AlphaFold2 was used to perform structural modeling on eight synthetic RBD variants. According to structural predictions, several non-binding ACE2 variants did not differ significantly from the original Wu-Hu-1 RBD. The ACE2-binding variations, on the other hand, displayed a wide range of potential structural conformations.
According to evidence, other endemic coronavirus receptor-binding domains may be undergoing adaptive evolution to avoid human antibody reactions. As a result, combining DML with phylogenetic models of viral evolution to predict SARS-CoV-2 escape from polyclonal antibodies present in the serum of vaccinated or convalescent individuals may enable the identification of future variants with the highest likelihood of emergence and thus support vaccine development for coronavirus disease 2019 (COVID-19).
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.