The National Human Genome Research Institute (NHGRI) has announced several new sequencing targets including the Northern white-cheeked gibbon (Nomascus leucogenys).
This sets the stage for completing a quest to sequence the genome of at least one non-human primate genome from each of the major positions along the evolutionary primate tree and making available an essential resource for researchers unraveling the genetic factors involved in human health and disease.
Comparing the genomes of other species to humans is an exceptionally powerful tool to help researchers understand the working parts of the human genome in both health and illness.
NHGRI's Large-Scale Sequencing Research Network and their international partners have already sequenced or been approved to sequence at high-density coverage the genomes of several non-human primates including the chimpanzee (Pan troglodytes), rhesus macaque (Macaca mulatto), orangutan (Pongo pygmaeus), marmoset (Callithrix jacchus) and gorilla (Gorilla gorilla).
"The gibbon genome sequence will provide researchers with crucial information when comparing it to the human genome sequence and other primate genomes, shedding light on molecular mechanisms implicated in human health and disease - from infectious diseases and neurological disorders to mental illness and cancer," said NHGRI Director Francis S. Collins, M.D., Ph.D.
The gibbon genome is unique because it carries an extraordinary high number of chromosome rearrangements, even when compared to other primates. These rearrangements occur when small or large segments of a chromosome become detached and reattach to the same chromosome or another chromosome. Such chromosomal rearrangements can wreak havoc on a cell, and can contribute to birth defects or cancer in humans. The gibbon genome will also help scientists better understand rearrangements called segmental duplications which are large, almost identical copies of DNA, present in at least two locations in the human genome. A number of diseases are known to be associated with mutations in segmental duplicated regions, including a form of mental retardation and other neurological and birth defects.
Segmental duplications cover 5.3 percent of the human genome, significantly more than in the rat genome, which has about 3 percent, or the mouse genome, which has between 1 and 2 percent. Segmental duplications provide a window into understanding how the human genome evolved and how it may still be changing. The high proportion of segmental duplications in the human genome shows how human genes have undergone rapid functional innovation and structural change during the last 40 million years, presumably contributing to unique characteristics that separate humans from non-human primate ancestors.
With the sequencing of major primate genomes, researchers are able to more precisely study the differences between primates and humans. For instance, an analysis of the chimpanzee genome sequence has revealed three key genes involved in inflammation have been deleted in the chimpanzee genome, possibly explaining some of the known differences between immune and inflammatory responses of chimps and humans. Identifying these genes gives researchers a more precise starting point for understanding molecular pathways and developing better diagnostics and therapies involved in immune and inflammatory diseases.
In addition, some primates are important biomedical models because of their genetic, physiologic and metabolic similarities with humans. For example, the rhesus macaque is an essential research model for drug development, neuroscience, behavioral biology, reproductive physiology, endocrinology, and cardiovascular studies. In addition, because it can be infected with simian immunodeficiency virus, a close cousin to the human immunodeficiency virus (HIV), the rhesus is widely recognized as the best animal model for research on Acquired Immune Deficiency Syndrome, or AIDS. It also serves as a valuable model for studying other human infectious diseases and for vaccine research, most recently for the virus causing Severe Acute Respiratory Syndrome, or SARS.
Comparing the human genome with the genomes of other non-human primates and other organisms has been shown to be an effective tool for identifying the function and structure of genes. Most sections of the human genome originated long before humans themselves. Consequently, scientists can use genome sequences of strategically selected organisms to learn more about how, when and why the genomes of humans and other mammals came to be composed of certain DNA sequences.
The latest sequencing plan, which includes the gibbon, was recently approved by the National Advisory Council for Human Genome Research, a federally chartered committee that advises NHGRI on program priorities and goals. It also consists of a set of organisms whose genome sequence will add to the comprehensive strategic list of priority targets for genomic sequencing by the NHGRI's Large-Scale Sequencing program.
Seven mammals which have been previously approved to be sequenced at low-density genome coverage have been targeted to now be sequenced at high-density genome coverage. The refined genome sequences will improve the accuracy of comparisons between mammalian genomes, one of the most effective ways to pinpoint the roughly 5 percent of the 3-billion base pair human genome that is most obviously functional.
The seven mammals to be sequenced are: the nine-banded armadillo (Dasypus novemcinctus); domestic cat (Felis catus); guinea pig (Cavia porcellus); African savannah elephant (Loxodonta Africana); tree shrew (Tupaia species); rabbit (Oryctolagus cuniculus); and a bat species that will be determined based on the availability of a high-quality DNA sample and the selected bat's promise as a biomedical model. NHGRI has recently approved the sequencing of the horse (Equus caballas) to high-density genome coverage.
A set of five fungi, known as dermatophytes, and which are the most common sources of human fungal disease, will also have their genomes sequenced. Dermatophyte fungi are highly communicable and infect millions of people worldwide leading to costs of approximately $400 million a year for treatment alone. The dermatophytes to be sequenced are Trichophyton rubrum, Microsporum canis and Microsporum gypseum, all which will be sequenced to a high-density genome coverage; and Trichophyton tonsurans and Trichophton equinum, both of which must be sequenced to a medium-density genome coverage. Scientists then will be able to compare the genome sequence information from these organisms to determine which genes are responsible for the differences in infectivity. Those genes will be logical starting points for developing more effective diagnostic, prevention and treatment approaches to fungal infections in both humans and animals.
Also selected in the latest round is a project to sequence up to 50 strains of the yeast Saccharomyces cerevisiae. The genome of Saccharomyces cerevisiae was first completed in 1996 and is a primary model for studying variations in genomes that can contribute to health and disease. The genomic data provided by this effort will allow researchers to develop basic tools to better understand human variation, such as distinguishing functional from non-functional variations within genes.
A final set of sequencing targets was chosen to address the question: What genes and other genomic features were responsible for the origin of multi-celled organisms? More than 1 billion years ago, two of the major multi-cellular groups of organisms (fungi and animals) shared a single-celled ancestor. This project targets ten of the earliest branches of animals and fungi along with some of their single-celled relatives providing, for the first time, comprehensive data to fill gaps in our understanding of animal and fungal evolution. Recent research has shown that some genes in the human genome that are responsible for early animal development arose much earlier than thought, in some cases in single-celled organisms. Therefore, this set of ten targets is likely to reveal the origins of other genes important for multi-cellularity in all such animals, including humans. The ten targets, all of which involve relatively small genomes, include six to be sequenced at high-density genome coverage: Capsaspora owczarzaki; Sphaeroforma arctica; an Amastigomonas species; a Salpingoeca or Codosiga species; Allomyces macrogynus; and Nucleria simplex; and four to be sequenced at low-density genome coverage: Amoebidium parasiticum; Mortierella verticilllata; Spizellomyces punctatus; and a Stophanoeca or Acanthocoepis species.
NHGRI's Large-Scale Sequencing Research Network also includes a portfolio of medical sequencing projects. These projects are designed to use high-throughput sequencing resources to lead to significant medical advances. As more is learned from sequencing and other studies about the genomic contribution to disease, and as the cost of obtaining sequence information decreases, genomic sequence information will become ever more important both for medical research and for providing medically relevant information to individuals. When it becomes affordable for an individual's genome to be fully sequenced, genomic information will allow estimates of future disease risk for individuals, as well as improve prevention, diagnosis, and treatment.
Projects given the highest priority will use large-scale sequencing over the next few years to identify the genes responsible for dozens of relatively rare, single-gene (autosomal Mendelian) diseases; sequence all of the genes on the X chromosome from affected individuals to identify those involved in sex-linked diseases; and to survey the range of variants in genes known to contribute to some common diseases.
An example of a medical sequencing project launched last year is The Cancer Genome Atlas (TCGA) pilot project, a groundbreaking effort between NHGRI and the National Cancer Institute that seeks to systematically characterize the genetic changes that occur in cancer. Information on TCGA is available at http://cancergenome.nih.gov.
Sequencing work on approved targets are carried out by the NHGRI-supported, Large-Scale Sequencing Research Network, which consists of five centers: Agencourt Bioscience Corp., Beverly, Mass.; Baylor College of Medicine, Houston; the Broad Institute of MIT and Harvard, Cambridge, Mass.; the J. Craig Venter Institute, Rockville, Md.; and Washington University School of Medicine, St. Louis. Assignment of new organisms to a specific center or centers will be determined at a later date.
NHGRI's process for selecting sequencing targets begins with three working groups comprised of experts from across the research community. Each of the working groups is responsible for developing a proposal for a set of genomes to sequence that would advance knowledge in one of three important scientific areas: to identify areas in genetic research where the application of high-throughput sequencing resources would rapidly lead to significant medical advances; understanding of the human genome; and understanding the evolutionary biology of genomes. A coordinating committee then reviews the working groups' proposals, helping to fine-tune the suggestions and integrate them into an overarching set of scientific priorities. The recommendations of the coordinating committee are reviewed and approved by one of NHGRI's advisory groups, The National Advisory Council for Human Genome Research, which in turn forwards its recommendations to NHGRI leadership. For more on the selection process, go to: www.genome.gov/Sequencing/OrganismSelection.
A complete list of organisms and their sequencing status can be viewed at www.genome.gov/10002154. High-resolution photos of many of the organisms being sequenced in NHGRI's Large-Scale Sequencing Program are available at: www.genome.gov/10005141.