Georgia Tech researchers unveil new version of genome annotation system

NewsGuard 100/100 Score

Georgia Tech researchers, working with colleagues in the National Center for Biotechnology Information (NCBI), have released a new version of a genome annotation system capable of analyzing more than 2,000 prokaryotic genomes per day, helping researchers accelerate prokaryotic genomics-based studies worldwide.

In biology, prokaryote generally describes a microorganism that lacks a distinct membrane-bound nucleus and has its genetic material contained in a single molecule of DNA. These include bacteria and archaea.

The NCBI operates the Prokaryotic Genome Annotation Pipeline, a high- performance software system designed to analyze gene sequences of these microorganisms. As more high-quality genomes become available - and as the cost of sequencing continues to fall - the need for high-throughput analysis and annotation pipelines cannot be overstated.

The latest advance comes as the NCBI incorporates Georgia Tech's GeneMarkS+ into the PGAP system. Developed by Mark Borodovsky's team at Georgia Tech, GeneMarkS+ is a self-training machine learning tool for novel gene identification that can combine intrinsic evidence revealed by genomic sequence patterns with extrinsic evidence derived from already annotated genomes.

"The new system enables researchers to get critically important analysis that consistently integrates information of all sources of evidence nearly in real time instead of days and weeks," said Borodovsky, a Regents' professor with a joint appointment in the School of Computational Science and Engineering and the Coulter Department of Biomedical Engineering. "Our group is excited to be a part of the whole team working on this project with high international visibility."

Before implementing GeneMark+ into the pipeline, the system could handle only 20 annotations daily.

"Dr. Borodovsky worked closely with Tatiana Tatusova's team at NCBI to incorporate and refine GeneMarkS+ in the context of the NCBI annotation pipeline," said Jim Ostell, chief of NCBI's Information Engineering Branch. "It provides a critical core infrastructure to NCBI and to users of NCBI resources."

PGAP uses GeneMarkS+ in conjunction with proteomic evidence obtained from large groups of orthologous gene clusters representing the core protein complement for well-annotated species. As new organisms are sequenced, PGAP adjusts by mining the existing protein information to build new core protein clusters, iteratively improving its annotation based on the ever-increasing wealth of available evidence from submitted bacterial genomes.

The new system offers a modular structure, permitting easy extension with new algorithms. PGAP also provides extensive tracking of execution and decision making, and thus permits an easy trace-back to understand the evidence behind key algorithmic decisions. The PGAP process is described at

PGAP produces high-quality annotation designed to meet INSDC standards for sequence submission and follows UniProt naming guidelines. PGAP is available at NCBI for bacterial genomes as part of GenBank sequence submission, making it a valuable resource to researchers worldwide.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AMG 133 (maridebart cafraglutide) weight loss drug shows promise in early trial