AI model ranks genetic variants from severe to mild disease mutations

By combining deep evolutionary signals with human population data, the popEVE model provides a novel approach to identifying the most damaging genetic mutations. This highlights previously hidden disease genes and provides clinicians with a powerful new way to prioritize variants in previously unsolved cases.

3D illustration. DNA helix with damaged segment highlighted in redStudy: Proteome-wide model for human disease genetics. Image credit: Rost9/Shutterstock.com

In a recent study published in Nature Genetics, a group of researchers advanced variant effect prediction across the human proteome by integrating deep evolutionary signals with human population constraints, enabling the ranking of missense variants for clinical genomics that considers their severity.

Why current variant scoring fails rare disease patients

About one in four people with a rare disease receives a genetic diagnosis even after whole-exome sequencing (WES), leaving families without answers or treatment direction. Clinicians must sift through millions of variants in each genome. Yet, most computational tools only compare changes within a single gene, rather than across proteins, making it difficult to understand the severity of a variant.

Deep evolution preserves features essential to fitness, while human population variation reveals gene-specific constraints. Integrating both could rank never-before-seen missense changes by organism-level impact, guiding singleton cases, triage, and more accurate counseling.

Further research is needed to develop calibrated, proteome-wide scoring that distinguishes benign from truly harmful variants, thereby accelerating rare-disease diagnosis worldwide in both clinical and research settings.

Training popEVE to score mutations across all proteins

The investigators built population-calibrated Evolutionary Variational model Ensemble (popEVE), a proteome-wide scoring model that integrates deep evolutionary information and human population constraint to rank missense variants across genes.

Evolutionary evidence was derived from two unsupervised protein models: the Evolutionary Model of Variant Effect (EVE), a Bayesian variational autoencoder (VAE) trained on multiple sequence alignments, and the Evolutionary Scale Modeling 1 variant (ESM-1v), a large language model (LLM) trained on protein sequences.

A population constraint was introduced via a latent Gaussian process that learned the relationship between evolutionary scores and missense intolerance from the United Kingdom Biobank (UKBB) and the Genome Aggregation Database (gnomAD). To minimize ancestry bias, the model used a coarse presence/absence indicator (present versus absent) rather than allele frequency.

Performance was benchmarked against leading predictors (AlphaMissense, Bayesian Deleteriousness (BayesDel), Rare Exome Variant Ensemble Learner (REVEL)) using ClinVar labels and deep mutational scans (DMS), then evaluated in rare-disease cohorts. De novo missense (DNM) calls from a severe developmental disorder (SDD) metacohort of ~31,000 trios were contrasted with unaffected sibling controls, and WES from a Deciphering Developmental Disorders (DDD) subset assessed separability.

A two-component Gaussian mixture fitted to variants defined severity thresholds, with a severe cut at −5.056 (99.99 % likelihood of the deleterious component). Structural proximity to interaction partners was quantified using Protein Data Bank (PDB) entries to contextualize top substitutions.

Evidence shows popEVE outperforms top predictors in clinics

Compared to the leading predictors, popEVE performed better in capturing disease severity. Pathogenic variants associated with childhood death had more deleterious scores than those associated with death in adulthood. Scores also separated the age of onset more than AlphaMissense, BayesDel, or REVEL. In the SDD metacohort, DNM scores shifted toward higher deleteriousness versus controls, with enrichment increasing at stringent thresholds. A Gaussian mixture established a severe cutoff of −5.056 (99.99 % likelihood). Variants below this threshold were enriched ~15× in cases, while moderate scores were enriched ~5×; benign-range scores matched expectation.

In the UKBB, 96 % of individuals carried no severely pathogenic missense variants, and most people had zero to five moderate variants, indicating that popEVE does not overpredict severity in the general population. Against diagnosed SDD cases, popEVE achieved the best average precision and recalled more cases at any given false-positive rate than comparators. For WES, the model separated cases from controls and avoided inflating pathogenic burden in UKBB, where alternative methods flagged many people with equally severe variants unnecessarily.

The framework also prioritized likely causal variants without parental genomes. Among 513 individuals with a severe DNM, 98 % had that variant ranked as the most deleterious in their exome. Selecting the top variant per person still recovered 95 % of the genes identified by DNM thresholding alone. When a causal DNM existed, popEVE ranked it more often above all rare inherited substitutes than AlphaMissense, BayesDel, or REVEL.

For discovery, popEVE identified 410 candidate genes in the SDD cohort using two complementary approaches (variant thresholding and gene collapsing), recovering 94 % of missense-identified genes previously reported and 123 novel candidates.

None of the novel variants appeared in UKBB or the gnomAD. Functional and network analyses supported this: novel genes showed physical interactions with known genes associated with developmental disorders and exhibited similar enrichment in Gene Ontology (GO) processes and fetal brain expression. Structure mapping added plausibility: 91 % of severe substitutions lie within 8 Å of an interaction partner.

Examples included eukaryotic translation termination factor 1 (ETF1) (R68L and R192C near Asparagine-Isoleucine-Lysine-Serine (NIKS) and Glycine-Glycine-Glutamine (GGQ) motifs in ribosomal complexes), eukaryotic translation initiation factor 4A isoform 2 (EIF4A2; Q60K contacting adenosine monophosphate (ANP)), and Nucleosome Remodeling and Deacetylase (NuRD) complex members histone deacetylase 2 (HDAC2; M31R in the foot pocket) and histone-binding protein retinoblastoma-binding protein 4 (RBBP4; H373R at the metastasis-associated protein 1 (MTA1) interface).

Another example was the calcium-gated potassium channel complex potassium calcium-activated channel subfamily N member 2 (KCNN2; I637F in the threonine-valine-glycine-tyrosine-glycine (TVGYG) pore) with calmodulin 1 (CALM1; D24Y disrupting calcium ion (Ca2+) binding). False positives in controls were low; gene collapsing found no significant hits, and only 0.5% of control individuals harbored a severe DNM.

A new path to faster, clearer rare disease answers

popEVE demonstrates that integrating deep evolution with human constraints enables a calibrated, proteome-wide ranking of missense variant severity, suited for clinical genetics. The approach distinguishes between childhood-lethal and adult-onset pathogenicity, enriches truly damaging DNM calls in severe developmental disorder cohorts, and avoids overcalling burden in population datasets.

It also recalls diagnosed cases from whole-exome data and prioritizes likely causal variants without parental genomes, while surfacing credible novel genes supported by structure and network context. As sequencing expands globally, severity-aware, minimally biased scoring can guide diagnosis, counseling, and research triage, providing faster answers to families worldwide and enabling scalable discovery of rare diseases.

Download your PDF copy now!

Journal reference:
  • Orenbuch, R., Shearer, C. A., Kollasch, A. W., Spinner, A. D., Hopf, T., van Niekerk, L., Franceschi, D., Dias, M., Frazer, J., & Marks, D. S. (2025). Proteome-wide model for human disease genetics. Nat Genet. DOI: 10.1038/s41588-025-02400-1. https://www.nature.com/articles/s41588-025-02400-1
Vijay Kumar Malesu

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.    

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Kumar Malesu, Vijay. (2025, December 03). AI model ranks genetic variants from severe to mild disease mutations. News-Medical. Retrieved on December 03, 2025 from https://www.news-medical.net/news/20251203/AI-model-ranks-genetic-variants-from-severe-to-mild-disease-mutations.aspx.

  • MLA

    Kumar Malesu, Vijay. "AI model ranks genetic variants from severe to mild disease mutations". News-Medical. 03 December 2025. <https://www.news-medical.net/news/20251203/AI-model-ranks-genetic-variants-from-severe-to-mild-disease-mutations.aspx>.

  • Chicago

    Kumar Malesu, Vijay. "AI model ranks genetic variants from severe to mild disease mutations". News-Medical. https://www.news-medical.net/news/20251203/AI-model-ranks-genetic-variants-from-severe-to-mild-disease-mutations.aspx. (accessed December 03, 2025).

  • Harvard

    Kumar Malesu, Vijay. 2025. AI model ranks genetic variants from severe to mild disease mutations. News-Medical, viewed 03 December 2025, https://www.news-medical.net/news/20251203/AI-model-ranks-genetic-variants-from-severe-to-mild-disease-mutations.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Shared genetic roots connect neurological and psychiatric disorders