The field of structural genomics - the study of the three dimensional geometric structures of proteins - is complicated by vast amounts of data, expensive experiments and cumbersome methods of analysis.
Computer Science Professor Bruce Randall Donald and his students are working to ease this burden by developing techniques that simultaneously minimize the number of experiments and accelerate the data analysis involved in determining the structure of proteins.
Learning about protein structure is especially relevant for treating illnesses that alter protein function, such as cancer.
Published in consecutive months of the Journal of Biomolecular NMR (nuclear magnetic resonance), Donald, a graduate student and a post-doctoral fellow present new algorithms that interpret NMR data to reveal a protein's shape and molecular architecture. NMR surveys a protein's molecular structure and uses tiny, spectroscopic protractors and rulers to generate a network of geometric measurements.
"In these papers, we discuss a new framework for thinking about how to solve these problems, and our algorithms are highly accurate," says Donald, the Joan P. and Edward J. Foley Jr. 1933 Professor of Computer Science and an Adjunct Professor of Chemistry and of Biological Sciences.
The first paper, published in June 2004, explains the work of Christopher Langmead, a doctoral student in Donald's laboratory who is now an assistant professor of computer science at Carnegie Mellon University. Langmead's algorithm introduced new techniques for assigning NMR measurements to specific molecular bonds. Most NMR experiments measure a protein, reporting distances between molecules and angles of chemical bonds, but the data doesn't indicate which atoms or bonds the measurements correspond to. "It's a little like taking all the heights and weights of everyone at a cocktail party, but you don't know which height goes with which person," says Donald. Langmead's and Donald's technique assigns the measurements to the correct nuclei, which helps to unveil the architecture of the protein.
A second paper by Donald and Lincong Wang, a Dartmouth post-doctoral fellow, will be published in July 2004. This paper describes a new protein structure determination algorithm, which solves complex algebraic equations that relate the experimental data to the protein's geometry. "Our algorithm requires less data and yet the resulting protein structures are incredibly accurate," says Donald. Unlike previous techniques, Wang's equations can be solved exactly in a manner similar to solving the quadratic equation of high school algebra fame. Wang and Donald hope that their work proves helpful to both structural genomic researchers as well as to those in the broader structural biology field.