Novel machine learning method detects animal coronaviruses that might infect humans

NewsGuard 100/100 Score

In a recent study posted to the bioRxiv* preprint server, researchers used machine learning (ML) tools to discover animal coronaviruses (CoVs), both alpha and beta CoVs, previously unknown to infect humans.

Study: Using machine learning to detect coronaviruses potentially infectious to humans. Image Credit: MAVV/Shutterstock
Study: Using machine learning to detect coronaviruses potentially infectious to humans. Image Credit: MAVV/Shutterstock

*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Background

It has remained challenging to predict which animal CoVs might infect humans because their whole host range is unknown. For instance, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) originated in an animal host, most likely bats. After a host expansion event, an essential step in viral evolution, SARS-CoV-2 spilled over into humans. Thus, it is crucial to survey all alpha and beta CoVs that infect animals near humans (e.g., farm animals, such as pigs) that facilitate their zoonotic transmission.

Both alignment-based and alignment-free approaches have shown promise when addressing the issue of viral host prediction, but the former exhibits poor efficiency as the sequence lengths increase. Likewise, alignment-free methods do not account for the relative position of the amino acid (AA) residues across the sequence.

About the study

In the present study, researchers developed a novel machine-learning model to predict the binding between the spike (S) protein of alpha and beta CoVs and a human receptor, such as human dipeptidyl-peptidase 4 (hDPP4) and angiotensin-converting enzyme 2 (ACE2).

To this end, they first downloaded 28,368 spike (S) protein sequences of all alpha and beta CoVs from the National Center for Biotechnology Information Virus database. They used a skip-gram model to convert this data into vectors that encoded the association between adjacent length k protein sequences called k-mers. Next, a classifier used these vectors to score each protein sequence per its human receptor binding potential, referred to as the human-Binding Potential (h-BiP).

The final alpha and beta CoV dataset spanning all their clades and variants had 2,534 AA sequences, based on which there were 1705 and 829 viruses with positive and negative annotations for human binding, respectively. Thus, the researchers split these 2,534 AA sequences into a training (85%) and test set (15%).

Further, the researchers used a subset of 424 sequences to generate a phylogenetic tree for the S protein of alpha and beta CoVs. The team used starting receptor-binding domain (RBD) structures of LYRa3 and LYRa11, generated using AlphaFold, for molecular dynamics (MD) simulations. The MD package YASARA helped simulate protein-protein interactions by substituting individual AA residues and searching for minimum-energy conformations on the final modified candidate structures. The team also performed an energy minimization (EM) routine for all modified candidate structures until free energy stabilized to within 50 Joules/mol. Due to the high accuracy of the classifier, the h-BiP score correlated with the percent sequence identity (in %) against human viruses. The team computed the pairwise % sequence identity between all seven human CoVs and the S protein sequences in the study dataset to select the maximum for each. Notably, all viruses with ≥97 % identity with previously known human CoVs had an h-BiP score >0.5.

Notably, the h-BiP score detected binding in cases of low sequence identity and discriminated between the binding potential for viruses with nearly the same sequence identity.

Results and conclusion

The researchers discovered LYRa326 and Bt13325, two viruses whose human binding properties are yet unknown, though they had high h-BiP scores. In support, phylogenetic analysis revealed that these two viruses were related to non-human CoVs previously known to bind to human receptors. The receptor binding motifs (RBM) within the receptor binding domain (RBD) of the S protein comes in direct contact with the host receptor. The multiple sequence alignment of the RBMs of Bt133 and LYRa3 with related viruses uncovered that they conserve contact residues that interact with the human receptor(s).

For instance, Bt133 had conserved all its eight contact residues used by Tylonycteris bat CoV HKU4 (Ty-HKU4) to bind hDPP4  despite having 13 RBD mutations. Similarly, LYRa3, phylogenetically related to SARS-CoV Tor2, had conserved 12 of its 17 contact residues that bind to hACE2. Moreover, except for residue 441, it had identical sequences at the RBD. MD simulations of the RBD further validated this binding and identified contact residues that bound human receptors.

Finally, the researchers tested whether this model surveyed host expansion events. They emulated the conditions before SARS-CoV-2 advent by removing all SARS-CoV-2 S protein sequences from the training set. They found that the re-trained ML model successfully predicted the binding between a human receptor and the wild-type SARS-CoV-2 S, with an h-BiP score equal to 0.96. Overall, the proposed ML-based method could prove to be a valuable tool for detecting, from a vast pool of animal CoVs, which viruses could cross species-barrier to infect humans.

*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Neha Mathur

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Mathur, Neha. (2022, December 15). Novel machine learning method detects animal coronaviruses that might infect humans. News-Medical. Retrieved on April 19, 2024 from https://www.news-medical.net/news/20221215/Novel-machine-learning-method-detects-animal-coronaviruses-that-might-infect-humans.aspx.

  • MLA

    Mathur, Neha. "Novel machine learning method detects animal coronaviruses that might infect humans". News-Medical. 19 April 2024. <https://www.news-medical.net/news/20221215/Novel-machine-learning-method-detects-animal-coronaviruses-that-might-infect-humans.aspx>.

  • Chicago

    Mathur, Neha. "Novel machine learning method detects animal coronaviruses that might infect humans". News-Medical. https://www.news-medical.net/news/20221215/Novel-machine-learning-method-detects-animal-coronaviruses-that-might-infect-humans.aspx. (accessed April 19, 2024).

  • Harvard

    Mathur, Neha. 2022. Novel machine learning method detects animal coronaviruses that might infect humans. News-Medical, viewed 19 April 2024, https://www.news-medical.net/news/20221215/Novel-machine-learning-method-detects-animal-coronaviruses-that-might-infect-humans.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Study reveals breakthrough in non-invasive detection of endometrial cancer