A study recently posted to the bioRxiv* preprint server revealed a new analytical pipeline to assess the clades of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) by molecular sequence analysis.
Study: RASCL: Rapid Assessment Of SARS-CoV-2 Clades Through Molecular Sequence Analysis. Image Credit: peterschreiber.media/Shutterstock
The continuous evolution of SARS-CoV-2 and the emergence of new variants of concern (VOCs) or interest (VOIs) have startled global health care systems and regulatory bodies. It is vital to understand the key features of the virus at the molecular level to effectively design strategies for the optimal management and mitigation of the ongoing coronavirus disease 2019 (COVID-19) pandemic.
Researchers have been focusing on unraveling the clade-specific features of SARS-CoV-2 lineages to surveil and control the pandemic. The rapid identification and characterization of the natural selection forces driving SARS-CoV-2 evolution and the emergence of new variants can be crucial and contributory to the efforts made to manage the COVID-19 pandemic efficiently.
The RASCL analytical pipeline
In the current work, researchers presented RASCL, short for Rapid Assessment of SARS-CoV-2 Clades, a new analytical pipeline designed to investigate the selective forces acting on SARS-CoV-2 genes to understand the nature and extent of these forces by employing comparative phylogenetic analysis.
The RASCL pipeline uses a query dataset of unaligned SARS-CoV-2 complete or partial genomes from a clade of interest integrated into a FASTA file. A second dataset is taken as a generic background or reference, which contains a set of SARS-CoV-2 sequences assembled from the Virus Pathogen Database and Analysis Resource (ViPR). These query and reference datasets are used as inputs in the RASCL pipeline, and the choice of these datasets can be analysis-specific. The pipeline can automatically remove sequences in the query dataset if they are present in the reference dataset.
RASCL subsamples the available sequences using a complete linkage distance clustering. A combined alignment is generated from the sequences in query and background datasets with those that are divergent enough to be used for subsequent selection analyses. Such divergent sequences are retained from the reference dataset for further analysis. Maximum-likelihood phylogeny is estimated on this combined dataset, and selection analyses are performed using molecular evolution models implemented in the HyPhy (hypothesis testing using phylogenies) package.
Potential applications of RASCL
The RASCL pipeline performs HyPhy analyses only on the internal branches of the phylogenetic tree to avoid sequencing errors and potential within-host evolutionary influences. The use of the pipeline has been discussed in different works by other researchers, where they implemented it for SARS-CoV-2 Beta, Gamma, and Omicron VOCs to characterize and understand the role of natural selection in the emergence of these VOCs. Another study conducted using RASCL worked on identifying the patterns of convergent evolution in N501Y SARS-CoV-2 lineages.
To summarize, the present work discussed the implementation of the novel RASCL pipeline and its prospective use for future genomic surveillance of new SARS-CoV-2 lineages. According to the authors, RASCL can be adapted for other viruses, albeit with minimal modifications to the pipeline.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information