Groundbreaking 'Gnocchi' map reveals hidden secrets of the human genome

In a recent study published in Nature, researchers in the United  States aggregated and processed 76,156 human genomes to construct a genomic constraint map named "genomic non-coding constraint of haploinsufficient variation" (Gnocchi) for the whole genome. They found that non-coding constrained regions in the genome were rich in known regulatory elements and variants linked to human traits and diseases. The map could be helpful in improving our understanding of functional genetic variation in the human genome.

Study: A genomic mutational constraint map using variation in 76,156 human genomes. Image Credit: Gio.tto / ShutterstockStudy: A genomic mutational constraint map using variation in 76,156 human genomes. Image Credit: Gio.tto / Shutterstock


Advancements in human genomic sequencing provide insights into variation patterns in genes, allowing the direct assessment of negative selection on missense and loss-of-function (LOF) variation through constraint modeling. Here, constraint is defined as the reduction of variation in a gene relative to an expectation based on the gene's mutability. Previous efforts focused on coding regions that represent less than 2% of the genome. As a result, the extensive non-coding genome remains less explored despite its recognized significance in complex human diseases. Applying the gene constraint model to non-coding regions faces challenges due to limited whole-genome data, lack of nucleotide-specific models, overrepresentation of coding regions in mutation analyses, and the complex, heterogeneous mutation rate influenced by local and larger-scale genomic features.

The current methods for evaluating non-coding region constraints include context-dependent mutational models, machine learning classifiers, and phylogenetic conservation scores. However, they have limitations— overlooking regional genomic features, dependency on well-characterized mutations, and a reduced power to detect recently selected regions with functional effects on human-specific diseases or traits. Addressing this need, researchers in the present study developed a genome-wide constraint map to identify functional genomic elements (especially in the non-coding space) that are likely to accumulate variation and have potential clinical implications. The map also offers insights into the impact of natural selection on human genetic variation.

About the study

The present study aggregated and reprocessed 153,030 whole genomes from the Genome Aggregation Database (gnomAD) and aligned them to the human genome reference build GRCh38. Ultimately, 76,156 high-quality samples were retained from healthy, unrelated individuals with diverse ancestries. The study identified and used 390,393,900 low-frequency, high-quality single nucleotide variants to construct the genome-wide constraint map. The genome was segmented into continuous, non-overlapping windows of size 1 kb. Constraint was quantified for each window by comparing the observed and the expected variation. A refined mutational model was used, which combined trinucleotide sequence context, regional genomic features, and base-level methylation to predict expected variation levels under neutrality. The deviation between the expected and observed variation was quantified using a "Gnocchi score." The correlation between the Gnocchi metric and various annotations of functional non-coding sequences was determined for validation. The ability of the Gnocchi score to prioritize non-coding variants was compared with other population genetics-based metrics, including Orion, CDTS (short for context-dependent tolerance score), gwRVIS (short for genome-wide residual variation intolerance score), and depletion rank, by measuring the area under the curve statistic. Further, the constraint for enhancers linked to specific genes was analyzed.

Results and discussion

The Gnocchi score was found to be close to zero for non-coding regions and significantly higher for windows containing coding sequences. About 3.12% and 0.05% of the non-coding windows showed constraint as strong as the 50th and 90th percentile of exonic regions, respectively. A significant positive correlation was found between constraint and functional non-coding annotations, demonstrating the utility of the Gnocchi score in characterizing non-coding regions and providing additional insights. The Gnocchi score was found to perform well against other non-coding metrics, effectively identifying functional variants in the non-coding genome. However, the researchers suggest a combination of metrics would be ideal for prioritizing functional variation. The Gnocchi metric was also found to be useful in prioritizing copy-number variants (CNVs), aiding the interpretation of non-coding risk factors in studies that associate CNVs with diseases. As per the study, enhancers linked to constrained genes were found to be significantly more constrained than those linked to presumably less constrained genes. Further, the study emphasizes the value of non-coding constraint as a complementary metric to gene constraint for identifying functionally important genes.

Although the biological impact of mutations in enhancers is less understood, the researchers suggest that there is potential for an extended model to provide biologically informed insights into non-coding variation and molecular mechanisms of selection. While the study utilizes one of the most extensive datasets of human genomes for the analysis of non-coding constraint, the power and resolution of the approach may significantly improve with an increase in sample size.


In summary, the present study highlights the significance of the genome-wide constraint map in analyzing non-coding regions and protein-coding genes. It marks a crucial advancement towards developing an inclusive catalog of functional elements in the human genome, prompting further research in the area.

Journal reference:
Dr. Sushama R. Chaphalkar

Written by

Dr. Sushama R. Chaphalkar

Dr. Sushama R. Chaphalkar is a senior researcher and academician based in Pune, India. She holds a PhD in Microbiology and comes with vast experience in research and education in Biotechnology. In her illustrious career spanning three decades and a half, she held prominent leadership positions in academia and industry. As the Founder-Director of a renowned Biotechnology institute, she worked extensively on high-end research projects of industrial significance, fostering a stronger bond between industry and academia.  


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chaphalkar, Sushama R.. (2023, December 07). Groundbreaking 'Gnocchi' map reveals hidden secrets of the human genome. News-Medical. Retrieved on February 21, 2024 from

  • MLA

    Chaphalkar, Sushama R.. "Groundbreaking 'Gnocchi' map reveals hidden secrets of the human genome". News-Medical. 21 February 2024. <>.

  • Chicago

    Chaphalkar, Sushama R.. "Groundbreaking 'Gnocchi' map reveals hidden secrets of the human genome". News-Medical. (accessed February 21, 2024).

  • Harvard

    Chaphalkar, Sushama R.. 2023. Groundbreaking 'Gnocchi' map reveals hidden secrets of the human genome. News-Medical, viewed 21 February 2024,


  1. matt morgan matt morgan Australia says:

    Thank you for passing on your knowledge, the sheer complexity of the whole DNA mechanism never ceases to amaze me. There's just sub system after sub system, all ticking away together in there 😳 ❤️ a self-editing, self-checking and repairing, self-unfolding molecular LIBRARY, each one may be a "program" for a plant, or bird or a person, and researchers like your good self just keep on uncovering more and more intricacies about it and it looks like evolution is becoming more and more unlikely 🤔

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
You might also like...
The relationship between psychiatric disorders and known genetic risks of dementia