In a recent study posted to the bioRxiv* preprint server, researchers used a convolutional neural network (CNN) regression model (CNN_seq) to investigate the effect of mutations on the binding affinity of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) receptor-binding domain (RBD)-angiotensin-converting enzyme 2 (ACE2) complex.
The first and crucial step for SARS-CoV-2 entry and subsequent infectivity involves RBD-ACE2 binding; a model reliably predicting the binding affinity of this association could aid with continuous monitoring of the evolution of SARS-CoV-2 in animal reservoirs. This is crucial as the threat of SARS-CoV-2's potential spillover to non-human species and back to humans could complicate future mitigation strategies against SARS-CoV-2.
About the study
In the present study, researchers trained the CNN_seq model on a curated dataset of 8,440 RBD variants for humans to analyze the combinations of single and multiple (up to four) amino acid mutations and identify candidates with the highest RBD-ACE2 binding affinity for monitoring.
Another study dataset of over 220,000 RBD variants belonged to four animal hosts of SARS-CoV-2, including white-tailed deer (Odocoileus virginianus), cattle (Bos indicus × Bos taurus), pigs (Sus scrofa), and chickens (Gallus gallus).
For humans, they obtained the three-dimensional (3D) structure of the SARS-CoV-2 spike (S) RBD-hACE2 complex and its sequences from the protein data bank. Likewise, they obtained ACE2-protein sequences from UniProt for four animal hosts of SARS-CoV-2. The team generated 3D structures for the RBD-ACE2 complexes for animals using the SWISS-Model as their experimentally determined structures were unavailable.
Since the RBD-ACE2 structure is complex, the researchers used the PISA tool to identify key interactions between residues in ACE2 and RBD, such as hydrogen bonds, salt bridges, and disulfide bonds between the amino acid pairs. They also used the Protein Contact Maps to record the distances between all the amino acid residue pairs in the RBD-ACE2 complex.
They also tested the study model against a blind test set of 1667 RBD variants during its training for a five-fold cross-validation test. The team also compared the predictive capability of the CNN_seq model against the experimental values of the apparent dissociation constant KD,app ratios. A KD,app ratio quantifies the change in RBD binding affinity for hACE2, and KD,app ratio greater than one indicates stronger RBD-hACE2 binding, whereas a value of less than one denotes weaker binding.
Lastly, the researchers computed the binding free energy (ΔG) of the RBD-hACE2 complex using the molecular-mechanics-based empirical force field termed Rosetta force-field.
The percent recovery of correct variant classification (%VC) attained by the CNN_seq model in five-fold cross-validation tests was 83.28%, with a Pearson correlation coefficient (r) of 0.85. It performed equally well in the blind test, fetching %VC of 83.47% and r of 0.84, reconfirming the robustness of the CNN_seq model predictions.
The %VC computes the accuracy of classification based on the change in the binding affinity compared to wild-type (WT) SARS-CoV-2 strain (in percentage); whereas r measures the strength of the linear correlation between the predicted and experimental KD,app ratio values.
Interestingly, mutations did not substantially alter the structure, contact map, and interfacial residue information of each RBD-ACE2 complex examined by the study model.
Further, the CNN_seq model predicted improved binding affinity for most circulating variants (total 21, of which 13 had World Health Organization (WHO) labels and remaining eight only had PANGO classification) relative to the WT, similar to experimental findings. It achieved a %VC of 92.9% and r of 0.60 on the blinded dataset of 15 variants containing multiple amino acid changes. Contrastingly, the neural network molecular mechanics-generalized Born surface area (NN_MM-GBSA) model performed relatively poorly, yielding a %VC of 75.7% and r of 0.28 for the same dataset.
For Omicron, the CNN_seq model predicted improved binding affinity with the KD,app ratio values of 1.23 ± 0.58. The resulting mean ΔG of the Omicron- and WT-RBD-hACE2 complexes were -36.2 ± 2.4 and -44.3 ± 1.0 kcal/mol, respectively.
The observed inconsistencies in the results obtained from different methods for Omicron were most likely attributable to over 15 mutations in its RBD, resulting in the formation or breakage of multiple amino acid pairs and consequent structural rearrangement.
The binding affinity of RBD variants to animal ACE2s followed the same pattern as for the human ACE2. Accordingly, the ACE2 of deer bound to SARS-CoV-2 RBD almost as strongly as human ACE2 while cattle, pig, and chicken ACE2s bound weakly.
Using about an 80-fold enlarged dataset, the novel CNN_seq model trained effectively and overcame the computational barriers associated with the NN_MD-MMGBSA procedure; thus, presenting an opportunity to comprehensively assess the binding affinity changes throughout the entire RBD as well as ACE2 receptors of humans as well as animals.
It is challenging to assess the binding affinity changes in response to two or more amino acid changes experimentally. However, the CNN_seq model assessed up to four mutations and related epistasis and synergistic effects. Most importantly, for Omicron, this model delineated the role of mutations as immune evading and binding enhancing.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.