With the advent of extremely rapid and accurate DNA sequencing technology, genetic data is piling up, but much of it remains to be interpreted. To make some sense of this data, researchers have used DNA sequences in genes to detect pairs of co-evolved genes. These are genes in pairs, of which one cannot be changed without a corresponding change reflected in the other, to maintain the function of the first.
The study was published in the journal Science on July 12, 2019.
Kateryna Kon | Shutterstock
In a collaborative project between the University of Washington School of Medicine and Harvard University, scientists have discovered protein-pair interactions by looking for coevolution in some residues of aligned protein sequences.
The thousands of proteins in a cell require interaction on a physicochemical level in order to fulfill their roles. For instance, some proteins must cluster to initiate DNA copying, while others form assemblies of molecules that give rise to fibers like muscles. However, the identity of interacting proteins is often unknown, and it takes a lot of time and money to pinpoint the pairing proteins.
Instead, co-evolution can help find the right proteins to form a pair. In this phenomenon, complementary changes are found in two genes, which suggest a close linkage between them.
One such example of co-evolution is when one gene undergoes a mutation that results in a change of shape of the protein product. To ensure the altered protein can still interact with the protein produced by the second gene in the pair, the latter gene also undergoes a corresponding change. Such alterations have been found in sequenced genomes.
Qian Cong, the first author of the paper, said, "Co-evolution has been useful for understanding how specific proteins interact, but we can now use it as a tool for discovery."
The team looked at over 4000 genes from E. coli and compared them to over 40,000 DNA sequences from other bacteria with a custom-made statistical tool. This helped them detect whether each of the 4000 genes had co-evolved. In total, they compared 5.4 million protein pairs.
The comparison proceeded for several rounds to determine which pairs had the highest chance of co-evolution. After screening, they predicted over 1600 protein-pair interactions. Of these, over 680 were quite unexpected. The team then compared their results with the results obtained by the same approach using a gold standard collection of already identified protein pair interactions.
This final step confirmed that the coevolution approach surpassed any preceding screening technique in precision. These included techniques such as proteome-based mass spectrometry and two-hybrid screening. The strongest coevolution is found for protein pairs who interact in metabolic pathways, and the weakest for those involved in processing gene-based data.
In fact, some of these interactions are being identified for the first time, including several which may improve our understanding of biological reactions. For instance, one protein-antitoxin pair could perhaps help researchers understand why some types of E. coli are the predominant flora in their part of the environment. The PSstB protein, which is involved in a metabolic pathway, is thought to also play a role in the synthesis of proteins and the passage of minerals within the body.
They repeated their approach with 3.9 million pairs of proteins in the tuberculosis bacterium, Mycobacterium tuberculosis, which has a distant relationship with E. coli. They were able to detect over 900 protein-protein interactions, of which 95% were new. At least 70 of these are likely to be virulence-associated proteins. Such findings are helpful in developing newer ways to treat this often fatal disease.
It is rare in biology for a software tool to make predictions that are promising enough to test, but that is exactly what's happening here. We are going to apply this tool to more pathogens, and the human genome. Our success will depend on how much work other scientists put into annotating which parts of the genome are genes and which parts are something else.”
Dr. Qian Cong, Researcher, Harvard University
The team expects that not more than 10% to 20% of these predictions will be proved to be false-positives. This will allow many other researchers to pick up on these pairs to learn more about their biological functions.
Cong Q., et al., (2019). Protein interaction networks revealed by proteome coevolution. Science. DOI: 10.1126/science.aaw6718.