Scientists at Penn State and the National Institute of Genetics in Japan have demonstrated that several statistical methods commonly used by biologists to detect natural selection at the molecular level tend to produce incorrect results.
"Our finding means that hundreds of published studies on natural selection may have drawn incorrect conclusions," said Masatoshi Nei, Penn State Evan Pugh Professor of Biology and the team's leader. The team's results will be published in the Online Early Edition of the journal Proceedings of the National Academy of Sciences during the week ending Friday 3 April 2009 and also in the journal's print edition at a later date.
Nei said that many scientists who examine human evolution have used faulty statistical methods in their studies and, as a result, their conclusions could be wrong. For example, in one published study the scientists used a statistical method to demonstrate pervasive natural selection during human evolution. "This group documented adaptive evolution in many genes expressed in the brain, thyroid, and placenta, which are assumed to be important for human evolution," said Masafumi Nozawa, a postdoctoral fellow at Penn State and one of the paper's authors. "But if the statistical method that they used is not reliable, then their results also might not be reliable," added Nei. "Of course, we would never say that natural selection is not happening, but we are saying that these statistical methods can lead scientists to make erroneous inferences," he said.
The team examined the branch-site method and several types of site-prediction methods commonly used for statistical analyses of natural selection at the molecular level. The branch-site method enables scientists to determine whether or not natural selection has occurred within a particular gene, and the site-prediction method allows scientists to predict the exact location on a gene in which natural selection has occurred.
"Both of these methods are very popular among biologists because they appear to give valuable results about which genes have undergone natural selection," said Nei. "But neither of the methods seems to give an accurate picture of what's really going on."
Nei said that for many years he has suspected that the statistical methods were faulty. "The methods assume that when natural selection occurs the number of nucleotide substitutions that lead to changes in amino acids is significantly higher than the number of nucleotide substitutions that do not result in amino acid changes," he said. "But this assumption may be wrong. Actually, the majority of amino acid substitutions do not lead to functional changes, and the adaptive change of a protein often occurs by a rare amino acid substitution. For this reason, statistical methods may give erroneous conclusions." Nei also believes that the methods are inaccurate when the number of nucleotide substitutions observed is small.
To demonstrate the faultiness of the statistical methods, Nei's team compiled data collected by their Emory University colleague, Shozo Yokoyama, on the genes that control the abilities of fish to see light at different water depths and on the genes that control color vision in a variety of animals. The team used these data to compare statistically predicted sites of natural selection with experimentally determined sites. They found that the statistical methods rarely predicted the actual sites of natural selection, which had been identified by Yokoyama through experiments. "In some cases, statistical method completely failed to identify the true sites where natural selection occurred," said Nei. "This particular exercise demonstrated the difficulty with which statistical methods are able to detect natural selection."
To demonstrate how small sample sizes can lead to incorrect results, the team used computer simulations to examine the evolution of genes in three primates: humans, chimpanzees, and macaques. The scientists mimicked the procedures used by the authors of a 2007 paper, which applied the branch-site method to 14,000 orthologous genes -- genes that are genealogically identical among different species -- and which found that the method predicted selection in 32 of the genes. Nei and his team also studied selection using Fisher's exact test, but this test did not detect any selection. "The results indicate that the number of nucleotide substitutions that occurred were too small to detect any selection; therefore, all of the 32 cases obtained by the branch-site method must be false positives," said Nozawa.
"These statistical methods have led many scientists to believe that natural selection acted on many more genes in humans than it did in chimpanzees, and they conclude that this is the reason why humans have developed large brains and other morphological differences," said Nei. "But I believe that these scientists are wrong. The number of genes that have undergone selection should be nearly the same in humans and chimps. The differences that make us human are more likely due to mutations that were favorable to us in the particular environment into which we moved, and these mutations then accumulated through time."
Nei said that to obtain a more realistic picture of natural selection, biologists should pair experimental data with their statistical data whenever possible. Scientists usually do not use experimental data because such experiments can be difficult to conduct and because they are very time-consuming.