A recent study published on the preprint server medRxiv* proposes a simple and specific method for the enhanced detection of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concern (VOCs) in wastewater samples by introducing a concept of quasi-unique mutations, corresponding to a given PANGO lineage. Herein, the researchers define a quasi-unique mutation for a lineage A as a mutation that is found in more than 50% of all available SARS-CoV-2 genomes that belong to the lineage A and is also found in less than 50% of genomes belonging to any other lineage B.
Study: Enhanced Detection of Recently Emerged SARS-CoV-2 Variants of Concern in Wastewater. Image Credit: Alex malexrea / Shutterstock.com
This method detects these quasi-unique mutations for target lineages and provides a more specific view compared to the routine lists of characteristic mutations for the corresponding lineages. This approach is data-driven and results in earlier detection and higher resolution of VOCs emergence patterns in wastewater genomic data.
“The combination of rules allows to extract mutational signatures for each of the lineages from the clinical data, and therefore screen for potential presence/absence of the VOCs/VOIs in the environmental samples.”
Wastewater epidemiology for virus surveillance
Wastewater surveillance efforts to track SARS-CoV-2 and screen for the presence of the VOCs and variants of interest (VOIs) have become popular and useful. Some of the techniques that have been used for this purpose include the direct metagenomic sequencing of the wastewater samples, targeted amplification and sequencing of the SARS-CoV-2 genetic material in the samples (using ARTIC protocol), and direct reverse-transcriptase qualitative polymerase-chain-reaction (RT-qPCR) detection of specific regions of the SARS-CoV-2 genome. Notably, targeted amplification is the most commonly adopted method.
Despite advanced technologies, SARS-CoV-2 wastewater variant surveillance is complicated with many challenges. Some of these challenges include partial ribonucleic acid (RNA) degradation, low abundance of genetic material in the environmental samples, and incomplete haplotype phasing.
Leveraging clinical data to characterize VOCs
Genomes obtained from clinical data globally are deposited into the GISAID database and are assigned a PANGO lineage via the Pangolin software. Because this software requires the inputs in the form of assemblies instead of sequence reads, data from a wastewater sample will be a single genome and is thus an inadequate representation of the genetic diversity within the sample. Further, this does not allow for any examination for the potential presence of multiple VOCs/VOIs in a single sample.
Thus, through direct use of Pangolin software for lineage assignment, individual mutations, and their relative abundances, the latter of which is otherwise known as allele frequencies, can be identified within a sample. However, this method can not reliably reconstruct the genomic mixture that gives rise to a sample within reasonable computational time.
The direct phylogenetic placement of sequencing reads onto the SARS-CoV-2 global phylogenetic tree is possible. However, some sub-sampling is required to be computationally efficient when assigning phylogenetically among more than 1.5 million SARS-CoV-2 sequences. Notably, the researchers noted that the Nextstrain sub-sampled version of global SARS-CoV-2 phylogeny contains clades conflicting with the PANGO lineage designation.
Further, the authors of the current study utilized a rule-based system by leveraging data from GISAID and extracted corresponding mutations from the multiple sequence alignment of the SARS-CoV-2 genomes. By analyzing this, the authors found that there are no mutations in VOCs/VOIs that would uniquely determine corresponding lineages among all the observed ones.
About the study
Based on these observations, the researchers introduce the concept of quasi-unique mutations. Comparing the screening process with the characteristic mutations list for the VOCs/VOIs maintained by United States Centers for Disease Control and Prevention (CDC), the researchers integrated the coverage information for the given quasi-unique positions.
They distinguished between cases of ‘no detection’ and ‘no coverage.’ This was done because, in some cases, the sample may have degraded or the amplification may have failed. Thus, instead of the absence of the variant in the sample, ‘no coverage’ is a suitable indication.
Using this approach, the researchers track the emergence of the Delta (B.1.617.2) variant across 39 wastewater treatment plants in Houston, Texas serving 2.3 million people. The researchers collected the samples, pretreated and extracted the RNA, and used it for (whole genome sequencing (WGS) library preparation.
To define the quasi-unique mutation sets, the researchers downloaded the multiple sequence alignments (MSA) and the metadata files from the GISAID website (https://www.gisaid.org/). Using vdb v2.0 (13), they extracted nucleotide changes and group them according to the lineage to which the corresponding genomes belong.
For the quasi-unique mutations sets for each lineage, the researchers first formed the consensus mutation sets per lineage. The consensus mutation sets included all the nucleotide changes present in more than 50% of the genomes in the lineage and subtracted from these sets consensus mutation sets of all other lineages. They defined the resulting mutation sets as quasi-unique mutations for the specific lineage.
“Finally, for each wastewater sample and each lineage of concern/interest, the sum of allele frequencies of quasi-unique mutations has been computed. The results were reported both per wastewater treatment plant and an aggregate for the city.”
Taken together, the authors observed that, in the presence of a strong signal for VoC, during the last two weeks of June, both the quasi-unique mutations and the characteristic mutations agree well.
“However, we also note that as we aim to track the early emergence, co-occurrence of certain characteristic mutations within other lineages can confound the picture, while quasi-unique mutations indicate a clear trend.”
In this study, the researchers proposed a simple method by introducing the concept of quasi-unique mutations for screening wastewater-derived SARS-CoV-2 sequencing samples for the potential emergence of VoC/VoI lineages.
While future improvements can be made to the method, it is important to apply these ideas in SARS-CoV-2 wastewater screening early on.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.