RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes

Download PDF Copy

Add News Medical on Googleas a preferred source

By Lakshmi Supriya, PhD.Jan 19 2021

A team of researchers from the Istituto Superiore di Sanita (ISS), Italy, report an open-source platform-independent tool for building severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from raw sequencing reads. The tool can be used without any extra hardware or software and be run using any browser from a desktop or mobile.

Study: SARS-CoV-2 RECoVERY: a multi-platform open-source bioinformatic pipeline for the automatic construction and analysis of SARS-CoV-2 genomes from NGS sequencing data. Image Credit: vchal / Shutterstock

*Important notice: bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

SARS-CoV-2, the causative pathogen of coronavirus disease 2019 (COVID-19) has rapidly spread across the globe resulting in more than two million deaths. Next-generation sequencing technologies (NGS) have allowed complete genome sequencing of the different virus strains, providing estimations of how the virus spreads over time and geographies.

NGS technologies can provide a large amount of sequences. However, one challenge is processing and manipulating the data because of their large size and lack of bioinformatics skills of users.

Many companies have developed platforms to support different sequencing standards and have made them available to users to a limited extent. However, most analysis of sequencing data is done using commercial software that requires licenses or internal command-line-pipelines, which require bioinformatics skills.

Researchers from the ISS in Rome developed an all-in-one pipeline that is independent of any platform for reconstruction and analysis of the complete SARS-CoV-2 genome. They collected common command-line-tools for SARS-CoV-2 genome reconstruction and analysis into a pipeline and implemented it on open-source Galaxy ARIES.

Open-source tool for SARS-CoV-2 genome analysis

The pipeline, called REconstruction of COronaVirus gEnomes & Rapid analysis (RECoVERY) has seven steps: analyzing read quality and trimming, subtracting human sequences, alignment reading and mapping against a reference SARS-CoV-2 sequence, calling variants, calling consensus sequence, de novo assembly, identifying open reading frames (ORFs), and annotating variants.

The authors used the genome sequence of the Wuhan-Hu-1 isolate as the reference to build two databases, one containing the complete virus genome and the other containing the ORFs annotation. Then, they removed the low-quality bases from the imported reads and excluded reads shorter than 30 base pairs.

After removing human genomic sequences, the team mapped the recovered unaligned reads to the reference SARS-CoV-2 sequence and the complete genome sequence is reconstructed using tools developed in-house. When a nucleotide position is not covered by sequencing, or there are less than 30 repetitions, the tool inserts an “N.” They performed coverage analysis using a tool, Qualimap 2. They used the BLASTn tool to annotate ORFs and the tool SnpEff tool to annotate variants.

The sequence read archive (SRA) was obtained from the Illumina, Nanopore, and Ion Torrent platforms. Then the team built the raw data using the pipeline developed in this study and compared the results of the analysis with those obtained from the CLC Genomics Workbench 9.5 and the Genome Detective Virus Tool.

Tool performs better than commercial software

The researchers found that the genomes built using the pipeline were longer by about 54 nucleotides on average compared to those built using CLC and Genome Detective. These genomes showed fewer differences in nucleotides than the genomes built using the other software. This is noteworthy because the missing nucleotides may include incorrect or missing nucleotide assignment, which would make it difficult to study the evolution and distribution of the virus, as most SARS-CoV-2 mutations are single point. Thus, the developed pipeline shows equal or better performance than available genome reconstruction software.

The pipeline reported in this study is freely accessible using the Galaxy instance ARIES. It provides a user-friendly interface and is fast, providing complete genome reconstruction of the SARS-CoV-2 genome in less than an hour for data up to 6 million reads. There is no need for separate hardware or software, and the analysis can be run using any desktop or mobile browser after registration on the ARIES homepage. Furthermore, ARIES does not access users’ data.

The simplicity of use and the production of a comprehensive report with all the variations characterized, make this pipeline a valuable tool particularly for scientists with little or no skill in bioinformatics.”

The analysis is completely automated and the user interface is designed to require little input from the user. According to the authors, using the software as an open-source pipeline will help scientists to work collaboratively for crowdsourcing-based advances on understanding the virus.

Journal reference:

Preliminary scientific report. Sabato, L. D. et al. (2021). SARS-CoV-2 RECoVERY: a multi-platform open-source bioinformatic pipeline for the automatic construction and analysis of SARS-CoV-2 genomes from NGS sequencing data. bioRxiv. https://doi.org/10.1101/2021.01.16.425365, https://www.biorxiv.org/content/10.1101/2021.01.16.425365v1

Comments (0)

Written by

Lakshmi Supriya

Lakshmi Supriya got her BSc in Industrial Chemistry from IIT Kharagpur (India) and a Ph.D. in Polymer Science and Engineering from Virginia Tech (USA).

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Supriya, Lakshmi. (2021, January 19). RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes. News-Medical. Retrieved on July 31, 2026 from https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx.
MLA
Supriya, Lakshmi. "RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes". News-Medical. 31 July 2026. <https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx>.
Chicago
Supriya, Lakshmi. "RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes". News-Medical. https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx. (accessed July 31, 2026).
Harvard
Supriya, Lakshmi. 2021. RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes. News-Medical, viewed 31 July 2026, https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.