Study: resLens: genomic language models to enhance antibiotic resistance gene detection. Image Credit: nepool / Shutterstock
A recent study published in npj Antimicrobials and Resistance developed a family of novel genomic language models (gLM), namely resLens, to improve the detection of antibiotic resistance genes (ARGs).
The rise in antibiotic resistance in pathogenic microbes warrants the development of more advanced tools to study ARGs and their evolution. Most available alignment-based tools, such as k-mer approaches, best-hit algorithms, and hidden Markov model (HMM) methods, have several limitations, including poor performance when variants and reference ARGs do not match closely.
Moreover, databases represent only a fraction of the resistome and may not keep up with the scale and pace of resistance evolution. While deep learning methods are more dynamic than alignment-based tools and have sought to address these limitations, many earlier approaches must learn their ARG and protein function representations from scratch, whereas resLens uses transfer learning from a pre-trained DNA language model.
ARG Dataset and resLens Model Design
In the present study, researchers presented resLens to enhance ARG detection and analysis. The study sourced ARGs from the National Center for Biotechnology Information (NCBI) Pathogen Detection RefGene and ResFinder databases. These datasets were merged, and genes that were perfect duplicates or perfect sub-sequences of other genes conferring resistance to the same antibiotic class were excluded.
Subsequently, antibiotic resistance classes with ≥ 20 instances in the dataset were retained and passed through the Prodigal tool to ensure only open reading frames (ORFs) were present. This pre-processing yielded over 7,600 ARGs across 12 antibiotic classes. Further, GenBank was queried for bacterial non-resistance genes of comparable length to ARGs, excluding those with > 90% sequence identity to any ARG sequence.
The ARG dataset was merged with an equal number of randomly selected non-resistance genes. The dataset was used to fine-tune the long-read (LR) model. For the short-read (SR) dataset, whole-gene sequences were split into 150-base-pair (bp) reads. Datasets were split into 80% training and 20% testing sets. Overall, four models were fine-tuned: two for SR data and two for LR data. One model performed binary classification of non-ARG and ARG for each dataset.
The second model then classified predicted ARGs into specific classes of ARGs. The team evaluated the resLens models against five alignment-based tools (AMR++, k-mer-based antibiotic gene resistance analyzer [KARGA], ResFinder, Meta-MARC, and resistance gene identifier [RGI]) and two deep learning models (DeepARG and ARGNet). The researchers noted that resLens outperformed other models on the LR dataset.
resLens Benchmarking And Performance Results
However, there was a modest difference between resLens and KARGA or RGI. Notably, RGI and KARGA outperformed resLens on the SR dataset. Moreover, resLens models closely replicated the class distribution in the LR test set compared with other models. resLens also showed competitive wall-clock inference times on the test set, although it was slower than only ARGNet on the LR test set and DeepARG and KARGA on the SR test set.
Further, the team aimed to assess model performance on novel ARGs. To this end, two gene families conferring resistance to aminoglycosides (aminoglycoside nucleotidyltransferase; ANT) and beta-lactams (blaADC), respectively, were identified, which had low sequence similarity with other families of genes conferring resistance to the same antibiotics. Next, the team created an LR test set with only ANT and blaADC family genes, and another LR training set comprising other genes.
The model was fine-tuned and evaluated on the new training and test sets. The model accurately classified genes withheld from the training set, although performance varied by gene family and was stronger for blaADC than for ANT. For comparison with an alignment-based method, the ResFinder database was recreated without ANT and blaADC genes, and ResFinder was evaluated on this new test set of withheld sequences. ResFinder performed poorly, identifying 86% of ANT genes but none of blaADC.
The researchers also performed a stricter clustered-split analysis to test more dissimilar sequences. Performance declined, especially for binary ARG detection, indicating that resLens could generalize beyond close database matches but still lost accuracy under stronger distribution shifts.
Whole-Genome Testing and Screening Limits
Finally, the team used LR models to analyze whole-genome sequencing (WGS) data of organisms with validated resistance phenotypes. RGI and ResFinder were similarly tested for comparison. Filtering and mapping antibiotic classes to resLens-predicted ones yielded 79 genomes with validated resistance phenotypes, with one to three classes of antibiotics per organism. RGI and resLens identified at least one gene corresponding to a given genome’s labeled phenotype more often than ResFinder.
However, the authors emphasized that this WGS analysis was exploratory rather than a definitive benchmark because the dataset had a limited sample size, non-exhaustive laboratory testing, and lacked gene-level annotation of the mechanisms underlying each resistance phenotype. Manual validation of resLens predictions identified many true positives, but also false positives and ambiguous or incorrect classifications, underscoring the need to use such tools for screening and hypothesis generation rather than for final conclusions.
Genomic Language Models Improve ARG Screening
The findings illustrate that gLMs can classify ARGs with high fidelity and speed and are less dependent on database(s) than other deep learning or alignment-based tools. resLens models outperformed deep learning tools and performed competitively with top alignment-based tools. Overall, the results highlight the potential of gLMs to improve ARG detection, including for ARGs with limited representation in reference databases, while reducing reliance on curated reference datasets without eliminating them.
Download your PDF copy by clicking here.
Journal reference:
- Mollerus M, Dittmar K, Crandall KA, Rahnavard A (2026). resLens: genomic language models to enhance antibiotic resistance gene detection. npj Antimicrobials and Resistance. DOI: 10.1038/s44259-026-00219-2, https://www.nature.com/articles/s44259-026-00219-2