By intensely and systematically comparing the human X chromosome to genetic information from chimpanzees, rats and mice, a team of scientists from the United States and India has uncovered dozens of new genes, many of which are located in regions of the chromosome already tied to disease.
Regions of the X chromosome, one of the two sex chromosomes (Y is the other), have been linked to mental retardation and numerous other disorders, but finding the particular genetic abnormalities involved has been difficult.
The team's accomplishment, described in the April issue of Nature Genetics, should speed research into diseases associated with the X chromosome and encourage similar analyses of other chromosomes.
"To our knowledge, this is the first time critical analysis of an entire chromosome has been done by a group that wasn't involved in determining the chromosome's genetic sequence," says study leader Akhilesh Pandey, M.D., Ph.D., an assistant professor in the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins and chief scientific adviser to the Institute of Bioinformatics (IOB) in Bangalore, India, where the analyses took place. "We didn't start small. We wanted to prove that complete annotation can be done, and done in a way that lets you find new and unexpected things."
For 18 months, 26 Indian scientists pored through the publicly available sequence of the X chromosome (information generated by the Wellcome Trust Sanger Institute in England and others) to identify genes and other important parts of its DNA.
But unlike other efforts, the team didn't just "mine the data" by using computers to search for known patterns in the genetic sequence. Instead, Pandey decided they would look for similarities between the human X chromosome's protein-encoding instructions and corresponding regions in the mouse. Regions that were identical or nearly so were then examined carefully by IOB biologists.
"We didn't want to start out by saying that genes had to look a certain way," says Pandey. "So our only initial assumption was that if a genetic region is important and codes for a protein, the sequence will be conserved at the protein level. Thus, even if the genetic sequence is different here and there, the protein sequence could still be the same."
Essentially, the researchers took advantage of the redundancy inherent in the genetic code. DNA's four building blocks -- A, T, C and G -- act as instructions for proteins in select three-block sets. These three-block sets each "code" for just one of the 20 possible protein building blocks, or amino acids, but some of the sets code for the same amino acid. For example, the DNA sequences TTGAGGAGC and CTACGATCA are quite different, but both specify the same three amino acids -- leucine, arginine and serine, in that order.