In a study published online today in Nature, a team of researchers from Penn State University and the University New South Wales present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, along with three additional whole exome sequences of Kalahari hunter-gatherers. The four hunter-gatherer participants were chosen for their linguistic group, geographical location and Y-chromosome haplotype while the Bantu individual is the revered Archbishop Desmond Tutu of South Africa. The landmark study represents the first of its kind to analyze whole genome and exome sequence data from this genetically distinct population, which is thought to be the oldest known lineage of modern day humans. The findings, which include over 13,000 novel SNPs, provide new insights into human population diversity and may enable the future development of drugs that benefit this ethnic group.
“The long reads were critical to identifying the full range of genetic variation in this unique population”
Interestingly, the approach for generating and analyzing the Kalahari hunter-gatherer whole genome sequence presented in the paper is unique to other recently published Asian, Yoruban, and European individual genomes. “We recognized that the genomes of the southern African participants in this study would diverge more from the human reference genome than other publicly available genome sequences,” explained Stephan C. Schuster, lead author and Professor of Biochemistry and Molecular Biology at Penn State University. “As a result, the goal was to generate data of sufficient quality for de novo genome assembly rather than simply mapping against the human reference.”
In order to generate the massive amounts of high-quality data required for de novo assembly of a human genome, the researchers turned to the GS FLX System with long-read GS FLX Titanium Series chemistry. “The long reads were critical to identifying the full range of genetic variation in this unique population,” explained Schuster. “In the end, we were able to generate the complete sequence of one Kalahari Bushman genome at 10-fold coverage, using both shotgun and 17 Kb span paired-end reads, as well as the protein-coding regions of all five participant’s genomes at 16-fold coverage using target enrichment with NimbleGen Sequence Capture arrays.”