Evo model set to transform synthetic biology and disease diagnosis

A new study presents "Evo" – a machine learning model capable of decoding and designing DNA, RNA, and protein sequences, from molecular to genome scale, with unparalleled accuracy. Evo's ability to predict, generate, and engineer entire genomic sequences could change the way synthetic biology is done. "The ability to predict the effects of mutations across all layers of regulation in the cell and to design DNA sequences to manipulate cell function would have tremendous diagnostic and therapeutic implications for disease," writes Christina Theodoris in a related Perspective.

With a vocabulary of just four nucleotides, DNA encodes all the genetic information essential for life. Variations in the genomic sequence reflect adaptations selected for specific biological functions. These variations drive evolution by enabling organisms to adapt to new or changing environments. Advances in DNA sequencing technologies have allowed for genomic variations to be mapped at the whole-genome scale. These data, combined with novel machine learning algorithms, could enable the creation of a comprehensive model that can understand DNA, RNA, and protein functions and their interactions. But, while some researchers inspired by the success of large language models (LLMs) have attempted to model DNA as a "language" by applying similar techniques, current generative models tend to focus narrowly on individual molecules or DNA segments. Alongside computational limitations, this has constrained the scope of these models in capturing broader genomic interactions necessary for understanding complex biological processes.

Here, Eric Nguyen and colleagues present Evo – a large-scale genomic foundation model, equipped with 7 billion parameters and designed to generate DNA sequences up to whole-genome scale. Built on the StripedHyena architecture, Evo was trained on a dataset of 2.7 million evolutionary diverse microbial genomes. According to Nguyen et al., Evo excels in both predictive and generative biological tasks, achieving high accuracy in zero-shot evaluations for predicting mutation impacts on bacterial proteins and RNA, as well as in modeling gene regulation. Evo also grasps the intricate coevolution between coding and noncoding sequences, supporting the design of complex biological systems like CRISPR-Cas complexes and transposable elements.

At the genomic scale, Evo can generate sequences over 1 megabase in length, a capability vastly surpassing prior models. "Future models may learn from diverse human and other eukaryotic genomes, using larger context lengths to capture distant genomic interactions over larger genomic scales," writes Theodoris in the Perspective.

Source:
Journal reference:

Nguyen, E., et al. (2024). Sequence modeling and design from molecular to genome scale with Evo. Science. doi.org/10.1126/science.ado9336.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Minority genetic variation in tuberculosis offers new insights for improving outbreak tracking