With protein design dating back to the mid-1970s, today’s scientists have access to a huge amount of information and tools that they can use to create proteins – such abundant, diverse material, in fact, that the data they need may be hard to find. Relevant information available to researchers includes resources as varied as protein sequences and databases of molecules and their individual properties.
The EU-funded EVO-COUPLINGS project is developing algorithms capable of exploiting the large amount of available data and quickly deducing the biological functionality of individual proteins. The four-year project ends in December 2018. It is funded by a grant from the EU’s Marie Skłodowska-Curie actions programme.
“Our research lets us better understand protein structure and function, which will ultimately lead to new methods for protein design,” says lead researcher Lucy Colwell of the University of Cambridge, UK. “Being able to better design proteins will have a profound impact on, for example, the development of drugs, and thus could very well be a key to our ability to treat and cure many diseases.”
Making sense of all the data
Proteins are large molecules consisting of at least one chain of amino acid residues. They perform a range of important functions, including replicating DNA, responding to stimuli and transporting small molecules.
Different amino acid sequences result in different proteins that serve different functions. By altering this sequence – or creating a new sequence from scratch – scientists design new proteins to serve specific functions.
In fact, for several decades, scientists have been doing exactly this, generating a wealth of information and tools in the process. The ability to access these valuable resources opens up an exciting research frontier. However, while the sheer amount of data holds huge potential, new analysis methods are required to understand the data.
According to Colwell, the goal of protein design is to find a protein sequence that will carry out a specific function. To accomplish this, if the right methods can be developed, the researcher will be able to exploit the knowledge held in databases as a shortcut to finding sequences that are likely to work.
EVO-COUPLINGS is developing powerful algorithms to solve this problem. The project’s algorithms are designed to exploit the available data and quickly provide leads as to the likely function of specific sequences. Having this information, says Colwell, is important because evolution produces many different examples of the same proteins, with small differences between versions occurring in different organisms.
“For example, the haemoglobin that is found in humans is slightly different from the haemoglobin found in other species,” she explains. “By using statistical inference to model how these small differences occur, we can extract information from the data and make accurate predictions about protein structure and function.”
Predicting protein structure
The project has already demonstrated the possibility of deducing the functionality of a protein by using new algorithms, Colwell observes. This discovery, she says, “provides the basis for a new generation of approaches that can be used to predict protein structures, thus significantly improving the state of the art in the field.”