The National Human Genome Research Institute (NHGRI), one of the National Institutes of Health (NIH), has announced its latest round of sequencing targets, with an emphasis on enhancing the understanding of how human genes function and how genomic differences between individuals influence the risk of health and disease.
The National Advisory Council for Human Genome Research, which is a federally chartered committee that advises NHGRI on program priorities and goals, recently approved three plans to specify the targets as part of its comprehensive strategy for NHGRI's Large-Scale Sequencing Research Network
"The goal of our sequencing program is to build the most powerful toolbox possible for advancing human health. By identifying and seeking to fill crucial gaps in our knowledge, these new sequencing plans represent yet another important step in that direction," said NHGRI Director Francis S. Collins, M.D., Ph.D.
The plan given the highest priority is a project to identify structural variations in the human genome, which will characterize the most common types of structural variation in human DNA. The effort will use 48 human DNA samples donated for the recently completed International HapMap Project, which produced a comprehensive catalog of human genetic variation, or haplotypes, designed to speed the search for genes involved in common diseases. The HapMap identified neighborhoods of tiny changes in DNA - known as single nucleotide polymorphisms (SNPs) - that can be involved in human disease. The structural variation effort will seek to identify instances where larger segments of DNA have been deleted, duplicated or rearranged - all of which can cause disease by disrupting the structure and function of genes.
A recent analysis has shown that these large-scale structural variations are much more common than previously appreciated. In fact, the genomes of any two humans are thought to differ by several hundred insertions, deletions and inversions.
The second plan will add DNA sequence to existing draft sequences of a number of primate species and add additional sequence information in regions of high biological interest within those genomes. The increased coverage - a high-density genome sequence - will allow for an even better understanding of the factors contributing to the evolution of the human genome. The primates chosen for this "index species" effort are rhesus macacque (Macaca mulatta), marmoset (Callithrix jacchus) and orangutan (Pongo pygmaeus). In the future, NHGRI intends to add other organisms to the list of index species for which high-density genome sequences are desirable.
The third plan includes sequencing the genomes of eight new mammals at low-density draft coverage, which will be generated by sequencing their genomes at two-fold coverage. That will bring to 24 the number of mammalian genomes sequenced at two-fold coverage, in addition to human and another seven mammalian genomes in draft or finished form sequenced by NHGRI-supported centers and made freely available in public databases. Scientists will use the combined data to look for features that are similar, or conserved, among the genomes of the human and other mammals.
The eight new mammals to be sequenced will be chosen from the following 10 species: dolphin (Tursiops truncates), elephant shrew (Elephantulus species), flying lemur (Dermoptera species), mouse lemur (Microcebus murinus), horse (Equus caballus), llama (Llama species), mole (Cryptomys species), pika (Ochotona species), a cousin of the rabbit, kangaroo rat (Dipodomys species) and tarsier (Tarsier species), an early primate and evolutionary cousin to monkeys, apes, and humans. NHGRI will base the choice of the eight mammals to be sequenced on the availability of high-quality DNA samples, the organisms' promise as biomedical models, and the presence of unique, innovative biological processes that may have contributed to the human genome over the course of evolution.
Such comparisons between mammalian genomes represent one of the most effective ways to pinpoint the roughly 5 percent of the 3-billion base pair human genome that is most obviously functional. According to computer modeling results, it is expected that comparisons among the 24 genome sequences will allow conserved sequences as small as six base pairs to be identified reliably. Six base pairs is roughly the size of a transcription factor binding site: a small DNA sequence occurring near a gene that is involved in switching the gene on or off.
Sequencing efforts will be carried out by the NHGRI-supported, Large-Scale Sequencing Research Network, which consists of five centers: Agencourt Bioscience Corp., Beverly, Mass.; Baylor College of Medicine, Houston; the Broad Institute of MIT and Harvard, Cambridge, Mass.; the J. Craig Venter Institute, Rockville, Md.; and Washington University School of Medicine, St. Louis. Assignment of each organism to a specific center or centers will be determined at a later date.
NHGRI's process for selecting sequencing targets begins with three working groups comprised of experts from across the research community. Each of the working groups is responsible for developing a proposal for a set of genomes to sequence that would advance knowledge in one of three important scientific areas: to identify areas in genetic research where the application of high-throughput sequencing resources would rapidly lead to significant medical advances; understanding of the human genome; and understanding the evolutionary biology of genomes. A coordinating committee then reviews the working groups' proposals, helping to fine-tune the suggestions and integrate them into an overarching set of scientific priorities. The recommendations of the coordinating committee are reviewed and approved by one of NHGRI's advisory groups, The National Advisory Council for Human Genome Research, which in turn forwards its recommendations to NHGRI leadership. For more on the selection process, go to: www.genome.gov/Sequencing/OrganismSelection.
A complete list of organisms and their sequencing status can be viewed at www.genome.gov/10002154. High-resolution photos of many of the organisms being sequenced in NHGRI's Large-Scale Sequencing Program are available at: www.genome.gov/10005141.