In a conceptual breakthrough, a team of researchers led by statisticians at the Harvard School of Public Health (HSPH) has figured out how to analyze huge quantities of genetic data, surpassing the capabilities of traditional techniques and speeding the quest for understanding the genetic basis of complex diseases such as asthma and diabetes. In doing so, the technique gives researchers a single way to consider multiple genes involved in complicated physical traits.
The team's paper, "On Genomic Screening in Family-Based Association Testing for Quantitative Traits: Screening and Replication in One Data Set," is published in the advance online issue of Nature Genetics. A hard copy version will be available in the journal's July 1 issue. The team includes researchers from HSPH, Boston University School of Medicine, Channing Laboratory, Children's Hospital, Boston, and Affymetrix, Inc., a California-based company that makes gene chips.
Completed in 2003, the Human Genome Project and ongoing related studies have made available unparalleled amounts of information regarding human genes. There are now 8 million genetic markers called SNPs (single nucleotide polymorphisms) available for analysis. These projects have been fueled by a technology revolution within the past two years. Now, a single study can involve a gene chip that holds up to 100,000 SNPs, whereas, previously, a study would have considered 10 to 15 SNPs to be a lot.
"Traditional analysis techniques have worked well, but are now being tremendously outpaced by new technologies," explained Kristel Van Steen, lead author of the study and postdoctoral fellow in the Department of Biostatistics at HSPH. "We have developed a new methodology that copes much better with large amounts of data."
Typically, a biostatistical study would demand a two-step process of 1) culling the number of SNP candidates, and 2) testing the survivors for associations to specific traits, such as risk for high body mass index. The process requires the use of two completely separate datasets.
This process also invokes something called the "multiple comparison problem." Every candidate SNP must be tested. The more SNPs a study has, the more likelihood of false-positive signals, the fewer SNPs end up surviving the initial culling process. The result: some true SNP candidates may never make it to the second testing phase. When dealing with tens of thousands of SNPs, that culling could mean numerous viable candidates slip through the cracks.