From deep-sea vents to other harsh habitats, researchers uncovered a vast microbial resource rich in novel genes and biosynthetic clusters, then used AI-guided screening to identify peptide candidates that could help power the next wave of antibiotic discovery.

Study: The Extreme Environment Microbiome Catalog (EEMC): a global resource for microbial diversity and antimicrobial discovery. Image Credit: Gallwis / Shutterstock
In a recent study published in the journal Nature Communications, researchers introduced the Extreme Environment Microbiome Catalog (EEMC), a large-scale resource designed to unlock the hidden microbial diversity of Earth’s most extreme habitats.
By reconstructing over 78,000 genomes from thousands of metagenomes and isolates, the team reveals vast, previously uncharacterized genetic and biosynthetic potential. Notably, the catalog enabled the identification of thousands of candidate antimicrobial peptides (cAMPs), many of which showed in vitro activity against hard-to-treat Gram-negative pathogens, highlighting the EEMC’s promise as a powerful platform for next-generation antimicrobial discovery. This study delivers a large, integrated resource for studying extreme-environment microbiomes.
Microorganisms that inhabit extreme environments offer a rich yet largely untapped source of novel metabolites with potential biomedical value. Although advances in sequencing and metagenomics have improved access to uncultivated microbes, most studies remain small in scale and limited to specific habitats, leaving global diversity and biosynthetic capacity insufficiently characterized. This gap is particularly critical amid the growing threat of antimicrobial resistance and the slowdown in antibiotic discovery. While genome mining and artificial intelligence have accelerated the search for antimicrobial peptides, challenges such as limited datasets, overlooked toxicity, and incomplete accounting for post-translational modification persist, underscoring the need for more comprehensive and integrative approaches.
Extreme Environment Metagenome Study Design
In the present study, researchers systematically compiled and reanalyzed metagenomic data from extreme environments worldwide to build a comprehensive genomic resource. The dataset spanned diverse habitats, including deep-sea, cryospheric, hypersaline, geothermal, subsurface, and hyperarid systems, capturing broad environmental variability.
The team assembled and curated more than 2,200 publicly available metagenomes alongside over 3,000 isolate genomes, including newly generated samples from cold seep sediments. Using established quality criteria, they reconstructed and refined over 78,000 bacterial and archaeal genomes. The investigators then clustered these genomes into species-level operational taxonomic units and performed taxonomic annotation and phylogenetic analyses to map diversity and distribution across environments.
Next, the team predicted open reading frames and constructed a large non-redundant gene catalog, followed by extensive functional annotation using multiple public databases. To evaluate biosynthetic capacity, they identified over 160,000 biosynthetic gene clusters (BGCs) and assessed their novelty through comparative and clustering approaches. They paid particular attention to ribosomally synthesized and post-translationally modified peptide (RiPP) clusters, given their relevance for antimicrobial discovery.
To identify promising therapeutic candidates, the researchers integrated machine learning tools with protein-based large language models (LLMs) to predict antimicrobial activity and toxicity. After screening thousands of candidates, they synthesized selected core peptides for experimental validation. The team assessed antibacterial activity, minimum inhibitory concentrations (MICs), and cytotoxicity, and further investigated peptide structure and mechanisms using imaging, membrane integrity assays, and other biophysical techniques.
Extreme Environment Microbial Diversity Results
The researchers established the EEMC as a large-scale, genome-resolved resource by reconstructing 78,213 microbial genomes from diverse extreme habitats and clustering them into 32,715 species-level groups. Strikingly, over 86% of these species did not map to the comparison reference genome sets, revealing more than 20,000 potentially novel species and substantially expanding global microbial diversity.
The catalog also captured nearly four billion unique genes, with about 19.21% remaining unannotated in the referenced databases. In addition, the team identified more than 163,000 biosynthetic gene clusters (BGCs), with novelty assessed through gene cluster family and clan analyses, highlighting immense and largely unexplored biosynthetic potential.
The team observed strong habitat-specific patterns. Environments such as the deep sea and cryosphere were major contributors to novelty, with the deep sea contributing the largest absolute number of novel genes and gene clusters. Many identified genes were linked to stress adaptation, transport, and metabolic regulation. The findings reflect ways in which microbes survive under extreme conditions while producing diverse secondary metabolites.
Candidate Antimicrobial Peptide Discovery Results
Using protein LLMs, the researchers identified 3,032 cAMPs predicted to be non-toxic. Notably, 84% of a set of 100 peptides synthesized for experimental testing inhibited bacterial growth. Importantly, the 50 candidates tested in mammalian cells showed low cytotoxicity. Several peptides demonstrated potent activity against hard-to-treat Gram-negative bacteria, with some showing low MICs.
Structural and mechanistic analyses revealed that many active peptides adopt α-helical conformations and act by disrupting bacterial membranes. Importantly, one lead candidate, cAMP_81, showed a reduced tendency to induce resistance over time. The findings underscore the promise of these early-stage, unmodified peptide scaffolds as next-generation antimicrobials derived from untapped extreme-environment microbiomes.
Extreme Environment Biotechnology Implications
The study positions the Extreme Environment Microbiome Catalog as a global reference resource for exploring microbial diversity and biosynthetic potential across Earth’s most extreme habitats. By uncovering vast taxonomic novelty and demonstrating the successful discovery of non-toxic antimicrobial peptides, the work highlights the untapped promise of extremophiles for drug development. Importantly, the findings suggest that current sampling has only begun to capture this diversity, pointing to a much larger reservoir yet to be explored.
Looking ahead, integrating advanced sequencing technologies, artificial intelligence, and targeted cultivation strategies will be key to unlocking this potential. Expanding functional validation, including mature post-translationally modified RiPPs, structural confirmation, and in vivo testing, could further accelerate the discovery of novel therapeutics, enzymes, and bioactive compounds, positioning the EEMC as a critical platform for future innovations in biotechnology and biomedicine.
Journal reference:
- Jiang, P. et al. (2026). The Extreme Environment Microbiome Catalog (EEMC): A global resource for microbial diversity and antimicrobial discovery. Nature Communications. DOI: 10.1038/s41467-026-71145-0, https://www.nature.com/articles/s41467-026-71145-0