<< 5,000 people die from bowel cancer each year making it one of Australia's biggest cancer killers | 57% of children are not read to daily by a parent or family member >>
Read in | English | Deutsch | 한국어

Statisticians learn how to analyze huge quantities of genetic data

Published on June 6, 2005 at 8:01 AM · No Comments

In a conceptual breakthrough, a team of researchers led by statisticians at the Harvard School of Public Health (HSPH) has figured out how to analyze huge quantities of genetic data, surpassing the capabilities of traditional techniques and speeding the quest for understanding the genetic basis of complex diseases such as asthma and diabetes. In doing so, the technique gives researchers a single way to consider multiple genes involved in complicated physical traits.

The team's paper, "On Genomic Screening in Family-Based Association Testing for Quantitative Traits: Screening and Replication in One Data Set," is published in the advance online issue of Nature Genetics. A hard copy version will be available in the journal's July 1 issue. The team includes researchers from HSPH, Boston University School of Medicine, Channing Laboratory, Children's Hospital, Boston, and Affymetrix, Inc., a California-based company that makes gene chips.

Completed in 2003, the Human Genome Project and ongoing related studies have made available unparalleled amounts of information regarding human genes. There are now 8 million genetic markers called SNPs (single nucleotide polymorphisms) available for analysis. These projects have been fueled by a technology revolution within the past two years. Now, a single study can involve a gene chip that holds up to 100,000 SNPs, whereas, previously, a study would have considered 10 to 15 SNPs to be a lot.

"Traditional analysis techniques have worked well, but are now being tremendously outpaced by new technologies," explained Kristel Van Steen, lead author of the study and postdoctoral fellow in the Department of Biostatistics at HSPH. "We have developed a new methodology that copes much better with large amounts of data."

Typically, a biostatistical study would demand a two-step process of 1) culling the number of SNP candidates, and 2) testing the survivors for associations to specific traits, such as risk for high body mass index. The process requires the use of two completely separate datasets.

This process also invokes something called the "multiple comparison problem." Every candidate SNP must be tested. The more SNPs a study has, the more likelihood of false-positive signals, the fewer SNPs end up surviving the initial culling process. The result: some true SNP candidates may never make it to the second testing phase. When dealing with tens of thousands of SNPs, that culling could mean numerous viable candidates slip through the cracks.

Comments
The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News-Medical.Net.



  Country flag

biuquote
  • Comment
  • Preview
Loading