Statisticians learn how to analyze huge quantities of genetic data

Download PDF Copy

Reviewed

Jun 6 2005

In a conceptual breakthrough, a team of researchers led by statisticians at the Harvard School of Public Health (HSPH) has figured out how to analyze huge quantities of genetic data, surpassing the capabilities of traditional techniques and speeding the quest for understanding the genetic basis of complex diseases such as asthma and diabetes. In doing so, the technique gives researchers a single way to consider multiple genes involved in complicated physical traits.

The team's paper, "On Genomic Screening in Family-Based Association Testing for Quantitative Traits: Screening and Replication in One Data Set," is published in the advance online issue of Nature Genetics. A hard copy version will be available in the journal's July 1 issue. The team includes researchers from HSPH, Boston University School of Medicine, Channing Laboratory, Children's Hospital, Boston, and Affymetrix, Inc., a California-based company that makes gene chips.

Completed in 2003, the Human Genome Project and ongoing related studies have made available unparalleled amounts of information regarding human genes. There are now 8 million genetic markers called SNPs (single nucleotide polymorphisms) available for analysis. These projects have been fueled by a technology revolution within the past two years. Now, a single study can involve a gene chip that holds up to 100,000 SNPs, whereas, previously, a study would have considered 10 to 15 SNPs to be a lot.

"Traditional analysis techniques have worked well, but are now being tremendously outpaced by new technologies," explained Kristel Van Steen, lead author of the study and postdoctoral fellow in the Department of Biostatistics at HSPH. "We have developed a new methodology that copes much better with large amounts of data."

Typically, a biostatistical study would demand a two-step process of 1) culling the number of SNP candidates, and 2) testing the survivors for associations to specific traits, such as risk for high body mass index. The process requires the use of two completely separate datasets.

This process also invokes something called the "multiple comparison problem." Every candidate SNP must be tested. The more SNPs a study has, the more likelihood of false-positive signals, the fewer SNPs end up surviving the initial culling process. The result: some true SNP candidates may never make it to the second testing phase. When dealing with tens of thousands of SNPs, that culling could mean numerous viable candidates slip through the cracks.

The new HSPH analysis method, which uses just one dataset, bypasses the multiple comparison problem altogether by first estimating how much genetics can explain a specific trait within a population, and then tracing the roots of the trait back to candidate SNPs that would explain that "genetic effect size." To test their methodology, the research team ran simulation studies using data from the Childhood Asthma Management Program (CAMP) Genetics Ancillary Study based at Channing Laboratory, Brigham and Women's Hospital, in Boston and data from a joint study conducted by the Mayo Clinic College of Medicine and Affymetrix. The results of the simulation studies suggested that the new approach outperformed the traditional approach by factors up to 100.

Besides dealing away with the multiple comparison problem, the HSPH technique offers another feature that is highly attractive to geneticists-the methodology appears to be able to find multiple SNPs involved in a single disease or trait.

"Many biomedical scientists today are interested in complex phenotypes, such as risk for unhealthy levels of body mass index, blood pressure, or cholesterol," said HSPH Assistant Professor of Biostatistics Christoph Lange, who is senior author on the paper. "Yet until now, no statistical tool existed that would allow researchers to look at several thousand disease genes and successfully identify those small number of genes that influence such complex traits."

The HSPH methodology is part of an analysis software program called PBAT, freely available at http://www.biostat.harvard.edu/~clange/default.htm. The program was developed by Lange and HSPH Professor Nan Laird.

The CAMP Genetics Ancillary Study is supported by the National Heart, Lung, and Blood Institute. The joint study conducted by the Mayo Clinic College of Medicine and Affymetrix was supported by the Mayo Clinic Genomic Center and Comprehensive Cancer Center and by the National Institutes of Health (NIH). The NIH provided additional funding for the HSPH research.

Source:

http://www.hsph.harvard.edu

Posted in: Medical Research News

Comments (0)

Download PDF Copy

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.