RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes

A team of researchers from the Istituto Superiore di Sanita (ISS), Italy, report an open-source platform-independent tool for building severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes from raw sequencing reads. The tool can be used without any extra hardware or software and be run using any browser from a desktop or mobile.

SARS-CoV-2, the causative pathogen of coronavirus disease 2019 (COVID-19) has rapidly spread across the globe resulting in more than two million deaths. Next-generation sequencing technologies (NGS) have allowed complete genome sequencing of the different virus strains, providing estimations of how the virus spreads over time and geographies.

NGS technologies can provide a large amount of sequences. However, one challenge is processing and manipulating the data because of their large size and lack of bioinformatics skills of users.

Many companies have developed platforms to support different sequencing standards and have made them available to users to a limited extent. However, most analysis of sequencing data is done using commercial software that requires licenses or internal command-line-pipelines, which require bioinformatics skills.

Researchers from the ISS in Rome developed an all-in-one pipeline that is independent of any platform for reconstruction and analysis of the complete SARS-CoV-2 genome. They collected common command-line-tools for SARS-CoV-2 genome reconstruction and analysis into a pipeline and implemented it on open-source Galaxy ARIES.

Open-source tool for SARS-CoV-2 genome analysis

The pipeline, called REconstruction of COronaVirus gEnomes & Rapid analysis (RECoVERY) has seven steps: analyzing read quality and trimming, subtracting human sequences, alignment reading and mapping against a reference SARS-CoV-2 sequence, calling variants, calling consensus sequence, de novo assembly, identifying open reading frames (ORFs), and annotating variants.

The authors used the genome sequence of the Wuhan-Hu-1 isolate as the reference to build two databases, one containing the complete virus genome and the other containing the ORFs annotation. Then, they removed the low-quality bases from the imported reads and excluded reads shorter than 30 base pairs.

After removing human genomic sequences, the team mapped the recovered unaligned reads to the reference SARS-CoV-2 sequence and the complete genome sequence is reconstructed using tools developed in-house. When a nucleotide position is not covered by sequencing, or there are less than 30 repetitions, the tool inserts an “N.” They performed coverage analysis using a tool, Qualimap 2. They used the BLASTn tool to annotate ORFs and the tool SnpEff tool to annotate variants.

The sequence read archive (SRA) was obtained from the Illumina, Nanopore, and Ion Torrent platforms. Then the team built the raw data using the pipeline developed in this study and compared the results of the analysis with those obtained from the CLC Genomics Workbench 9.5 and the Genome Detective Virus Tool.

Tool performs better than commercial software

The researchers found that the genomes built using the pipeline were longer by about 54 nucleotides on average compared to those built using CLC and Genome Detective. These genomes showed fewer differences in nucleotides than the genomes built using the other software. This is noteworthy because the missing nucleotides may include incorrect or missing nucleotide assignment, which would make it difficult to study the evolution and distribution of the virus, as most SARS-CoV-2 mutations are single point. Thus, the developed pipeline shows equal or better performance than available genome reconstruction software.

The pipeline reported in this study is freely accessible using the Galaxy instance ARIES. It provides a user-friendly interface and is fast, providing complete genome reconstruction of the SARS-CoV-2 genome in less than an hour for data up to 6 million reads. There is no need for separate hardware or software, and the analysis can be run using any desktop or mobile browser after registration on the ARIES homepage. Furthermore, ARIES does not access users’ data.

The simplicity of use and the production of a comprehensive report with all the variations characterized, make this pipeline a valuable tool particularly for scientists with little or no skill in bioinformatics.”

The analysis is completely automated and the user interface is designed to require little input from the user. According to the authors, using the software as an open-source pipeline will help scientists to work collaboratively for crowdsourcing-based advances on understanding the virus.

*Important Notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Lakshmi Supriya

Written by

Lakshmi Supriya

Lakshmi Supriya got her BSc in Industrial Chemistry from IIT Kharagpur (India) and a Ph.D. in Polymer Science and Engineering from Virginia Tech (USA).

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Supriya, Lakshmi. (2021, January 19). RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes. News-Medical. Retrieved on May 12, 2021 from https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx.

  • MLA

    Supriya, Lakshmi. "RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes". News-Medical. 12 May 2021. <https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx>.

  • Chicago

    Supriya, Lakshmi. "RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes". News-Medical. https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx. (accessed May 12, 2021).

  • Harvard

    Supriya, Lakshmi. 2021. RECoVERY: New open-source software developed for analyzing SARS-CoV-2 genomes. News-Medical, viewed 12 May 2021, https://www.news-medical.net/news/20210119/RECoVERY-New-open-source-software-developed-for-analyzing-SARS-CoV-2-genomes.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
You might also like... ×
Monoclonal antibody neutralizes all SARS-CoV-2 variants of concern