In a study published online today in Nature, a team of researchers from Penn State University and the University New South Wales present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, along with three additional whole exome sequences of Kalahari hunter-gatherers. The four hunter-gatherer participants were chosen for their linguistic group, geographical location and Y-chromosome haplotype while the Bantu individual is the revered Archbishop Desmond Tutu of South Africa. The landmark study represents the first of its kind to analyze whole genome and exome sequence data from this genetically distinct population, which is thought to be the oldest known lineage of modern day humans. The findings, which include over 13,000 novel SNPs, provide new insights into human population diversity and may enable the future development of drugs that benefit this ethnic group.
“The long reads were critical to identifying the full range of genetic variation in this unique population”
Interestingly, the approach for generating and analyzing the Kalahari hunter-gatherer whole genome sequence presented in the paper is unique to other recently published Asian, Yoruban, and European individual genomes. “We recognized that the genomes of the southern African participants in this study would diverge more from the human reference genome than other publicly available genome sequences,” explained Stephan C. Schuster, lead author and Professor of Biochemistry and Molecular Biology at Penn State University. “As a result, the goal was to generate data of sufficient quality for de novo genome assembly rather than simply mapping against the human reference.”
In order to generate the massive amounts of high-quality data required for de novo assembly of a human genome, the researchers turned to the GS FLX System with long-read GS FLX Titanium Series chemistry. “The long reads were critical to identifying the full range of genetic variation in this unique population,” explained Schuster. “In the end, we were able to generate the complete sequence of one Kalahari Bushman genome at 10-fold coverage, using both shotgun and 17 Kb span paired-end reads, as well as the protein-coding regions of all five participant’s genomes at 16-fold coverage using target enrichment with NimbleGen Sequence Capture arrays.”
The study results were consistent with the belief that southern Africans are among the most divergent of all human populations. The researchers identified more SNPs in their genomes than in other individual human genomes sequenced to date, as well as thousands of novel SNPs. “These results will be a rich resource for future work, providing many new candidate functional sites that have not been included in whole-genome association studies,” said Vanessa Hayes, a project co-leader and Group Leader Cancer Genetics at the Children’s Cancer Institute Australia for Medical Research at the University of New South Wales. “Ultimately, we hope that these sequences will serve as an important cultural and genetic archive of this indigenous population, one of the last remaining hunter-gather societies.”
“This study exemplifies the power of the 454 Sequencing System to comprehensively analyze whole genomes or targeted regions to identify both known and novel variants. We applaud the research team for recognizing the importance of de novo assembly, particularly for this genetically distinct human population,” said Michael Egholm, Vice President of R&D and Chief Technology Officer 454 Life Sciences. “In this study, the combination of long GS FLX Titanium reads and NimbleGen Sequence Capture Exome arrays also allowed the researchers to obtain a high-resolution picture of the protein-coding regions of all five study participants, offering an economical alternative whole genome sequencing method.”