Researchers have discovered new regions of the human genome particularly vulnerable to mutations. These altered stretches of DNA can be passed down to future generations and are important for how we study genetics and disease.
The regions are located at the starting point of genes, also known as transcription start sites. These are sequences where cellular machinery starts to copy DNA into RNA. The first 100 base pairs after a gene's starting point are 35% more prone to mutations compared with what you'd expect by chance, according to the study published today in Nature Communications.
These sequences are extremely prone to mutations and rank among the most functionally important regions in the entire human genome, together with protein-coding sequences."
Dr. Donate Weghorn, corresponding author of the study and researcher, Centre for Genomic Regulation, Barcelona
The study found that many of the excess mutations appear immediately after conception, during the first few rounds of cell division in the human embryo. Also known as mosaic mutations, these changes to the DNA sequence end up in some cells but not others and are part of the reason the mutational hotspot has gone undiscovered until now.
A parent can carry disease-contributing mosaic mutations without symptoms because the change ends up in some cells or tissues only. However, they can still pass the mutation on through their eggs or sperm. The child then carries the mutation in all their cells, which could cause disease.
The researchers made the discovery by looking at transcription start sites across 150,000 human genomes from the UK Biobank and 75,000 from the Genome Aggregation Database (gnomAD). They compared the results with data that included information about mosaic mutations from eleven separate family studies.
They found that many gene starting sites across the human genome experienced excess mutations. When the researchers looked more closely, they found the most affected regions were the starting points of sets of genes linked to cancer, brain function and defective limb development.
The mutations are likely to be harmful. The study found a strong excess of mutations near start sites when looking at extremely rare variants, which are normally very recent mutations. That excess shrunk when looking at older, more common variants, suggesting natural selection is filtering the mutations out. In other words, families with mutations in gene starting sites, particularly those linked to cancer and brain function, are less likely to pass them on. Over generations, the mutations do not stick around.
Avoiding false conclusions and finding missed clues
The study can help avoid false conclusions from mutational models. These are tools which help geneticists determine how many mutations are to be expected in specific regions of the genome if nothing special is going on. Clinically, that baseline is used to determine which variants should be paid attention to and which deprioritised.
Knowing that gene starting points are natural mutational hotspots means the true baseline in these regions is higher than previously thought and models need to be recalibrated to take that into account.
"If a model doesn't know this region is naturally mutation-rich, it might expect, say, 10 mutations but observe 50. If the correct baseline is 80, then 50 means fewer than expected and is a sign harmful changes are being removed by natural selection. You would completely miss the importance of that gene," explains Dr. Weghorn.
The study also has implications for genetic studies which only look for mutations present in the child and completely absent in parents. This works well for mutations that are present in every cell, but not for mosaic mutations which end up in a patchwork of different tissues. These studies are filtering out mosaic mutations and inadvertently losing important information about potential contributors to disease.
"There is a blind spot in these studies. To get around this, one could look at the co-occurrence patterns of mutations to help detect the presence of mosaic mutations. Or look at the data again and revisit discarded mutations that occur near the transcription starts of genes most strongly affected by the hotspot," says Dr. Weghorn.
A new source of mutations
The process of transcribing DNA into RNA is hectic. The study explains the mutational hotspot exists because the molecular machinery involved often pauses and restarts near the start line. It can even fire in both directions. At the same time, short-lived structures can form that briefly leave one strand of DNA exposed to possible damage.
All of this, the authors argue, makes transcription start sites more prone to mutations during the rapid cell divisions that follow conception. Cells can usually repair these alterations, but under the pressure of needing to grow fast, cells leave some mutations unpatched like scars on the human genome.
The discovery adds a previously missing piece on how mutations arise in the first place. The obvious culprits, like errors during DNA replication or damage from ultraviolet rays, have been known about for decades. "Finding a new source of mutations, particularly those affecting the human germline, doesn't happen often," concludes Dr. Weghorn.
Source:
Journal reference:
Cortés Guzmán, M., et al. (2025). Transcription start sites experience a high influx of heritable variants fueled by early development. Nature Communications. doi: 10.1038/s41467-025-66201-0. https://www.nature.com/articles/s41467-025-66201-0