Study offers recommendations for improving statistical inference in population genomics

The second century Alexandrian astronomer and mathematician Claudius Ptolemy had a grand ambition. Hoping to make sense of the motion of stars and the paths of planets, he published a magisterial treatise on the subject, known as the Almagest. Ptolemy created a complex mathematical model of the universe that seemed to recapitulate the movements of the celestial objects he observed.

Unfortunately, a fatal flaw lay at the heart of his cosmic scheme. Following the prejudices of his day, Ptolemy worked from the premise that the Earth was the center of the universe. The Ptolemaic universe, composed of complex "epicycles" to account for planet and star movements, has long since been consigned to the history books, though its conclusions remained the scientific dogma for over 1200 years.

The field of evolutionary biology is no less subject to misguided theoretical approaches, sometimes producing impressive models that nevertheless fail to convey the true workings of nature as it shapes the dizzying assortment of living forms on Earth.

A new study examines mathematical models designed to draw inferences about how evolution operates at the level of populations of organisms. The study concludes that such models must be constructed with the greatest care, avoiding unwarranted initial assumptions, weighing the quality of existing knowledge and remaining open to alternate explanations.

Failure to apply strict procedures in null model construction can lead to theories that seem to square with certain aspects of available data derived from DNA sequencing, yet fail to correctly elucidate underlying evolutionary processes, which are often highly complex and multifaceted.

Such theoretical frameworks may offer compelling but ultimately flawed pictures of how evolution actually acts on populations over time, be these populations of bacteria, shoals of fish, or human societies and their various migrations during prehistory.

In the new study, Jeffrey Jensen, a researcher in the Biodesign Center for Mechanisms of Evolution at Arizona State University and professor in the School of Life Sciences with the Center for Evolution & Medicine, leads a group of international luminaries in the field in providing guidance for future research. Together, they describe a range of criteria that can be used to better ensure the accuracy of models that produce statistical inferences in population genomics-;a scientific discipline concerned with large-scale comparisons of DNA sequences within and across populations and species.

One of our key messages is the importance of considering the contributions of evolutionary processes certain to be in constant operation (such as purifying selection and genetic drift), before simply relying on hypothesized or rare evolutionary processes as the primary drivers of observed population variation (such as positive selection)."

Jeffrey Jensen, Researcher, Biodesign Center for Mechanisms of Evolution, Arizona State University

The research findings appear in the current issue of the journal PLOS BIOLOGY.

A field comes of age

Population genomics arose as early efforts in the field attempted to reconcile Charles Darwin's notion of evolution by means of natural selection with the first inklings of the mechanisms of inheritance, uncovered by the Augustinian monk, Gregor Mendel.

The synthesis culminated in the 1920s and early 30s, largely thanks to the mathematical work of Fisher, Haldane and Wright, who were the first to explore how natural selection together with other evolutionary forces would modify the genetic composition of Mendelian populations over time.

Today, studies in population genomics involve the large-scale application of various genomic technologies to explore the genetic composition of biological populations, and how various factors, including natural selection and genetic drift, produce changes in genetic composition over time.

To accomplish this, population geneticists develop mathematical models quantifying the contributions of these evolutionary processes in shaping gene frequencies, use this theory to design statistical inference approaches for estimating the forces producing observed patterns of genetic variation in actual populations, and test their conclusions against accumulated data.

The spice of life

The study of genomic variation focuses on DNA sequence differences among individuals and populations. Some of these variants are critically important for biological function, including mutations responsible for genetic disease, while others have no detectable biological effects.

Such variation in the human genome can take several forms. One common source of variation is known as single nucleotide polymorphisms, or SNPs, where a single DNA letter in the genome is altered. But larger-scale variation in the genome, involving the simultaneous alteration of hundreds or even thousands of base pairs is also possible. Again, some such alterations may play a role in disease risk and survival while many others have no effect.

Natural selection may occur when different variants segregating in a population have a fitness differential relative to one another. By designing and studying mathematical models governing the corresponding gene frequency change and applying those models to empirical data, population geneticists seek to understand the contributing evolutionary processes in a rigorous, quantitative way. Thus, population genetics is often regarded as the theoretical cornerstone of modern Darwinian evolution.

Adrift through the genome

Although the importance of natural selection to the evolutionary process is undeniable, the role of positive selection in increasing the frequency of beneficial variants -; the potential driver of adaptation -; is certain to be comparatively rare relative even to other forms of natural selection. For example, purifying selection -; the removal of deleterious variants from the population -; is a constantly acting and far more pervasive form of selection.

In addition, there are multiple non-selective evolutionary processes of great importance. For example, genetic drift describes the many stochastic fluctuations inherent to evolution. In large populations, natural selection may act more efficently in purging deleterious variation and potentially fixing beneficial variation, whereas as populations become smaller genetic drift will be increasingly dominant.

The distinction can be seen in dramatic form when comparing prokaryotic organisms like bacteria with organisms composed of eukaryotic cells, including humans. In the former case, the vast population sizes tend to result in more efficient selection. In contrast, a weaker selection pressure operating in eukaryotes is more permissive of genomic changes, provided that they are not strongly deleterious.

According to the Neutral Theory of Molecular Evolution -; a now guiding principle of evolutionary theory proposed by the population geneticist Motoo Kimura over 50 years ago -; most evolutionary changes at the molecular level in real populations are governed not by natural selection, but by genetic drift. The study emphasizes that this critical point is too often missed by evolutionary biologists. As co-author Michael Lynch, director of ASU's Biodesign Center for Mechanisms in Evolution cogently observes, "natural selection is just one of several evolutionary mechanisms, and the failure to realize this is probably the most significant impediment to a fruitful integration of evolutionary theory with molecular, cellular, and developmental biology."

The new consensus study further stresses that a failure to consider these alternative evolutionary mechanisms which are certain to be operating, including genetic drift, and incorporate these into models of population genomics, is likely to lead researchers astray. The common overreliance on purely adaptive models to explain genomic variation has led to a raft of interpretations of dubious value, the authors assert.

The study presents a detailed flow chart that can help guide the development of more accurate models used to draw evolutionary inferences, based on genomic data. Biological parameters that vary among species include not only evolutionary variables like population size, mutation rates, recombination rates, and population structure and history, but the way the genome itself is structured and life history traits, including mating behavior. All of these factors play a vital role in dictating observed molecular variation and evolution.

"While these many considerations may sound daunting for some researchers, it is important to note that many excellent research groups at ASU and around the world are actively improving our understanding of these underlying evolutionary parameters, providing constantly improving inference, for example, of mutation and recombination rates," added co-author Susanne Pfeifer, an Assistant Professor in the Center for Evolution & Medicine and the Biodesign Center for Mechanisms of Evolution.

Where once, theoretical models in population genomics proliferated alongside relatively scant genomic data, today an avalanche of data, enabled by rapid, low-cost DNA sequencing of organisms across the tree of life, has dramatically changed the field. The careful and judicious use of this gold mine of genomic data will help advance the most rigorous models to unlock evolution's many remaining mysteries.

Source:
Journal reference:

Johri, P., et al. (2022) Recommendations for improving statistical inference in population genomics. PLOS Biology. doi.org/10.1371/journal.pbio.3001669.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
UAB team develops a combined score to predict cancer treatment success