Sorting huge amounts of data is a bottleneck in protein research, a field that is crucial to make use of the gene-editing technology CRISPR and fully understand diseases like cancer, Alzheimer's and Parkinson's. Now, researchers at the University of Copenhagen have become the first in the world to employ artificial intelligence to do the heavy lifting — and do so in a way that can ensure common international standards while making advanced protein science more accessible.
Using artificial intelligence, UCPH researchers have solved a problem that until now has been the stumbling block for important protein research into the dynamics behind diseases such as cancer, Alzheimer's and Parkinson's, as well as in the development of sustainable chemistry and new gene-editing technologies.
It has always been a time-consuming and challenging task to analyze the huge datasets collected by researchers as they used microscopy and the smFRET technique to see how proteins move and interact with their surroundings. At the same time the task required a high level of expertise. Hence, the proliferation of stuffed servers and hard drives. Now researchers at the Department of Chemistry, Nano-Science Center, Novo Nordisk Foundation Center for Protein Research and the Niels Bohr Institute, University of Copenhagen, have developed a machine-learning algorithm to do the heavy lifting.
We used to sort data until we went loopy. Now our data is analyzed at the touch of button. And, the algorithm does it at least as well or better than we can. This frees up resources for us to collect more data than ever before and get faster results."
Simon Bo Jensen, biophysicist and PhD student at the Department of Chemistry and the Nano-Science Center
The algorithm has learned to recognize protein movement patterns, allowing it to classify data sets in seconds -- a process that typically takes experts several days to accomplish.
"Until now, we sat with loads of raw data in the form of thousands of patterns. We used to check through it manually, one at a time. In doing so, we became the bottleneck of our own research. Even for experts, conducting consistent work and reaching the same conclusions time and time again is difficult. After all, we're humans who tire and are prone to error," says Simon Bo Jensen.
Just a second's work for the algorithm
The studies about the relationship between protein movements and functions conducted by the UCPH researchers is internationally recognized and essential for understanding how the human body functions. For example, diseases including cancer, Alzheimer's and Parkinson's are caused by proteins clumping up or changing their behaviour. The gene-editing technology CRISPR, which won the Nobel Prize in Chemistry this year, also relies on the ability of proteins to cut and splice specific DNA sequences. When UCPH researchers like Guillermo Montoya and Nikos Hatzakis study how these processes take place, they make use of microscopy data.
"Before we can treat serious diseases or take full advantage of CRISPR, we need to understand how proteins, the smallest building blocks, work. This is where protein movement and dynamics come into play. And this is where our tool is of tremendous help," says Guillermo Montoya, Professor at the Novo Nordisk Foundation Center for Protein Research.
Attention from around the world
It appears that protein researchers from around the world have been missing just such a tool. Several international research groups have already presented themselves and shown an interest in using the algorithm.
"This AI tool is a huge bonus for the field as a whole because it provides common standards, ones that weren't there before, for when researchers across world need to compare data. Previously, much of the analysis was based on subjective opinions about which patterns were useful. Those can vary from research group to research group. Now, we are equipped with a tool that can ensure we all reach the same conclusions," explains research director Nikos Hatzakis, Associate Professor at the Department of Chemistry and Affiliate Associate Professor at the Novo Nordisk Foundation Center for Protein Research.
He adds that the tool offers a different perspective as well:
"While analyzing the choreography of protein movement remains a niche, it has gained more and more ground as the advanced microscopes needed to do so have become cheaper. Still, analyzing data requires a high level of expertise. Our tool makes the method accessible to a greater number of researchers in biology and biophysics, even those without specific expertise, whether it's research into the coronavirus or the development of new drugs or green technologies."
Thomsen, J., et al. (2020) DeepFRET, a software for rapid and automated single-molecule FRET data classification using deep learning. eLife. doi.org/10.7554/eLife.60404.