In a recent study posted to the pre-print medRxiv* server, a team of researchers introduced a novel method to estimate the proportion of different severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains in wastewater samples.
The impact of the coronavirus disease 2019 (COVID-19) pandemic has brought to fore the urgent need to identify the inception of outbreaks, the transmission of SARS-CoV-2, and the real-time trends of COVID-19 spread.
An effective strategy for monitoring the early detection of SARS-CoV-2 in different population groups is wastewater-based epidemiology (WBE). SARS-CoV-2 excreted by both symptomatic and asymptomatic individuals can be effectively detected using WBE, making it an efficacious model for distinguishing the genetic characteristics of the SARS-CoV-2 virus.
About the study
The present study investigated a method to provide proportional estimates of SARS-CoV-2 variants in wastewater sampled from a community and to correlate the estimates with the number of COVID-19 cases in that community.
The researchers used an imputation approach to allow a like-to-like comparison of sequencing reads against the reference strains of the SARS-CoV-2 variants. This method of detection of SARS-CoV-2 sequence composition was based on the Tree imputation method. The Tree imputation method was compared to the Common allele imputation method to determine the efficacy of the methods by removing sequenced nucleotides in SARS-CoV-2 sequences, followed by re-imputation using either the Tree imputation or Common allele imputation.
To accurately determine the strain composition of SARS-CoV-2 variants, the researchers developed a new phylogenetic method that allowed data imputation for SARS-CoV-2 sequences. For each sequencing data set, the authors removed genomes with SNP alleles that had an allele frequency less than the frequency threshold. This elimination of genomes reduced the size of the alignment to less than 1000 relevant genomes. These relevant genomes were then used to calculate the number of mismatches between each genome and the sequencing read. The probability of sequencing read occurring in each strain was then calculated.
The researchers used the expectation maximum (EM) algorithm to estimate the proportion of different SARS-CoV-2 variants. The performance of the algorithm was evaluated by simulating several sets of sequencing reads containing varying numbers of SARS-CoV-2 strains. The estimated proportions of each strain were then evaluated and compared to the real values.
In the present study, the Tree imputation method was found to have error rates of more than
5 × 10-4 while the Common allele imputation method showed error rates of more than 0.02. Hence, the Tree imputation method was more accurate at reporting sequencing data. The error rates in the Common allele imputation method were due to heterozygosity. The Tree imputation method exhibited errors at sites that switch allelic states often, suggesting a high degree of homoplasy in these sites. It was also noted that the site with the greatest number of imputation errors also had a high proportion of sequencing errors.
According to the researchers, imputation is more accurate in the detection of SARS-CoV-2 than in the case of diploid organisms, due to the virus’s strong phylogenetic structure. The proportion of SARS-CoV-2 variants estimated using the phylogenetic imputation method is similar to the true proportions. However, the proportions of the true strains were over-estimated when the coverage of the sequences was low. For a higher depth of coverage, the proportions were found to be more accurate.
The phylogenetic method developed by the researchers showed high accuracy, with error rates comparable to the error rates of typical sequencing methods. The EM algorithm further exhibited the accuracy of the phylogenetic method in effectively estimating the proportions of SARS-CoV-2 variants in wastewater samples.
The phylogenetic method developed by the researchers enabled effective detection of SARS-CoV-2 proportions in wastewater samples, and also strengthened reference databases. This method of quantifying SARS-CoV-2 could also be used to correct sequencing errors by integrating it with algorithms for imputation-informed sequencing. The authors recommend the usage of an average sequencing depth of 1000X to achieve higher accuracy with this method.
Given the high transmissibility of COVID-19, tools for monitoring the spread of SARS-CoV-2 within a community should be implemented effectively. However, the presence of a wide variety of SARS-CoV-2 strains makes this process highly expensive. To address this, wastewater sequencing can be used as a cost-effective and accurate approach to monitor the spread of SARS-CoV-2 variants and curb the further spread of the virus.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.