A new study reveals how ancient population mixing, unique founder events, and centuries of endogamy have shaped India’s distinct genetic landscape, influencing everything from ancestry to risk of rare diseases.
Study: 50,000 years of evolutionary history of India: Impact on health and disease variation. Image Credit: szefei / Shutterstock
India is a vast country with a remarkably diverse population, comprising approximately 5,000 groups defined by their religious, ethnic, and linguistic characteristics. Yet, historically, it has been underrepresented in genomic surveys. A recent paper in the journal Cell examined whole-genome sequences from 2,762 individuals across India to derive ancestry and genetic variation across the population, with an emphasis on their impact on disease gene variation.
The data were collected from the Longitudinal Aging Study in India-Diagnostic Assessment of Dementia (LASI-DAD), which covers individuals aged 60 years or older. This is the most comprehensive survey of India's genetic variation to date, covering most geographic regions and all major language groups, as well as historically underrepresented groups such as tribal communities. It compared their genomes with those from worldwide groups, both ancient and modern.
Three ancestral groups
The results show that most Indians can trace their ancestry to three groups: South Asian hunter-gatherers, Eurasian Steppe pastoralists, and ancient farmers related to those from the 4th millennium BCE, specifically those from Tajikistan (Sarazm_EN). South Asian hunter-gatherer ancestry is highest in samples from the South compared to the North of India, in Dravidian compared to Indo-European language groups, and in tribal groups compared to others.
The researchers identified a common source of Iranian farmer genes (two individuals, dubbed Sarazm_EN, who were ~3600–3500 BCE farmers from Tajikistan) in the ancestors of four groups in India: the Ancestral South Indian (ASI), Ancestral North Indian (ANI), Austroasiatic-related, and East Asian-related groups. ASI are a hypothetical group formed by the admixture of Indigenous South Asians with ancient Iranian farmers. Ancient North Indians are another hypothetical group formed by the admixture of ASI with ancient Eurasian Steppe pastoralists. These groups are indicated by the recent analysis of ancient DNA.
Sarazm_EN had trade connections with South Asia, including the early Indus Valley civilization. One of the two individuals wore shell bangles identical to artifacts from Indus Valley sites like those found at sites in Pakistan and India.
Most of the genetic variation is derived from a population that is supposed to have migrated from Africa approximately 50,000 years ago. Smaller contributions (1%–2%) from Neanderthal and Denisovan ancestors were also found.
Founder events drive homozygosity
The analysis revealed extensive homozygosity (identical gene copies) and identity-by-descent sharing (of chromosomal regions) among individuals. In this sample, which contains fewer than 2,700 individuals, there is at least one other individual related in the fourth degree or closer to each subject. The level of homozygosity (sharing identical gene copies) is two to nine times higher compared to East Asians or Europeans.
Founder events (caused by the formation of a gene pool from a small number of founder individuals by endogamy or consanguinity) reduce genetic variation, increase the chances of inheriting a disease-causing gene variant, and increase the risk of recessively inherited disease. Founder events account for 90% of homozygosity caused by descent in India.
"These findings underscore the extensive familial connections among Indians, reflecting historical, cultural, or social patterns such as endogamy."
Homozygosity and disease
Of the over 400,000 missense variants and putative loss-of-function variants catalogued in this study, over 40% (mostly extremely rare ones) are being described for the first time in a genomic survey. Approximately 214 are present in ClinVar, which annotates disease-causing variants associated with a variety of inborn and acquired diseases. These are present only in India. Homozygous deleterious mutations are more common in the South Asian hunter-gatherer ancestry, correlated with DNA markers of consanguinity and recent founder events.
Despite the overall low frequency of these variants, some occur at higher frequencies in some groups. For instance, the pathogenic variant L307P, which causes butyrylcholinesterase deficiency, increases the risk of muscle paralysis when exposed to muscle relaxants commonly used during anesthesia. It occurred in 15 individuals in the current study, eight of whom were from Telangana, as it is fairly common among the Vysyas of Andhra Pradesh and Telangana in southern India.
Being able to identify such variants and their distribution is helpful in understanding how the disease arises and preventing its occurrence.
Neanderthal and Denisovan sequences
Indians retain the largest diversity of Neanderthal sequences (85% of globally identified variants), though East Asians have higher per-individual Neanderthal ancestry. Despite this, Oceanians have the highest proportion of Denisovan ancestry (about 2%), while Indians have about 0.1%. Over half of the Denisovan sequences are seen only in Indians.
Many of these gene variants are implicated in adaptation, immunity, and a cluster of genes on chromosome 3 that increases the risk of severe disease with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection.
This knowledge could help customize innovative therapies for autoimmune and infectious disorders in Indian populations.
At the same time, there were six and 13 regions with no trace of Neanderthal and Denisovan ancestry, respectively, including four without either. One of the latter houses the FOXP2 gene, implicated in human language development. Understanding the functions of these regions could help identify new gene variants associated with human-specific characteristics and diseases.
Conclusion
This fascinating study provides an overview of the genetic variation within India's populations and the contributions of different ancestral groups, from ancient to more recent times. It also underlines the effect of endogamy in increasing the frequency of homozygosity and pathogenic gene variants.
"The unique genetic structure of Indians underscores the importance of incorporating ancestry and homozygosity in future medical and functional genomics research."
These results require reference individuals across time and space, sometimes only one or two individuals. As more ancient samples become available, these relationships may need to be revised, as with the Iranian farmer ancestry, which relies on only two samples.
Being able to identify the genomic regions derived from each of the three major ancestral sources could reveal how adaptations to Indian conditions evolved and the origin of these adaptive genes, as well as identify disease susceptibility patterns.