Researchers at the King Abdullah University of Science and Technology (KAUST) have developed an artificial intelligence (AI) tool that can accurately detect which sites on the surface of a protein bind to RNA molecules.
Juan Gaertner / Shutterstock.com
The tool, called NucleicNet, outperforms other computational tools and provides insights that could help in the design and development of drugs.
RNA binding is a fundamental feature of many proteins. Our structure-based computational framework can reveal the detailed RNA-binding properties of these proteins, which is important for characterizing the pathology of many diseases."
Jordy Homing Lam, Co-First Author
Proteins routinely interact with RNA molecules to regulate the processing and transport of gene transcripts. When this interaction fails, cellular function is disrupted, which can lead to illnesses, including cancer and neurodegenerative disease.
Investigating the protein-RNA interaction using deep learning
To investigate which parts of an RNA molecule bind to different parts of a protein’s surface, the team employed an AI technique called deep learning. Lam and colleagues Xin Gao and Yu Li, trained NucleicNet to automatically learn the structural features that enable proteins and RNA to interact.
To train the deep learning algorithm, the researchers used 3D structural data available for 158 different protein-RNA complexes held on a public database.
When the team compared NucleicNet against other predictive tools that relied on sequence data rather than structural data, they found that their software was the most accurate tool for determining which sites on the surface of protein-bound RNA molecules.
Unlike the other models, NucelicNet could also predict which parts of the RNA were binding to the protein, whether it be a region of the sugar-phosphate backbone or one of the four nucleobases.
Validating the software
Next, in collaboration with colleagues in China and the United States, the researchers validated their software using a diverse set of RNA-binding proteins to demonstrate that the interactions determined by NucleicNet were closely matched with those determined using experimental techniques.
"Structure-based features were little considered by other computational frameworks," says Lam. "We have harnessed the power of deep learning to infer those subtle interactions."
The NucleicNet is openly accessible to researchers interested in predicting RNA-binding sites and binding preference for any protein they are studying.
Lam, J. H., et al. (2019). A deep learning framework to predict binding preference of RNA constituents on protein surface. Nature Communications. DOI: 10.1038/s41467-019-12920-0.