As new variants emerge of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the pathogen causing the current coronavirus disease 2019 (COVID-19) pandemic, continue to circulate, early identification and sequencing have become a necessary part of research in this area. This virus is known to be shed from the nose, the lungs, the saliva, urine, and feces.
A new study, published on the medRxiv* preprint server, reports the use of metagenomics to detect the virus from wastewater samples, indicating community-level circulating variants that are not yet identified or are present in very low proportions on clinical databases.
Earlier studies showed that the detection of viral RNA in feces and urine is possible in COVID-19 patients as well as individuals with asymptomatic infection. This knowledge led to the use of wastewater in SARS-CoV-2 surveillance. This is not a novel application, with Wastewater-Based Epidemiology (WBE) being a valuable technique in epidemiological surveillance carried out on a broader scale.
Not only is wastewater a cheap and non-invasive source of samples containing the different viral variants in a community, but it yields real-time data on the strains of SARS-CoV-2 in circulation at the time of the study. This offers an important advantage in vaccine and antiviral development.
Massive sequencing helps identify new mutations
The maximization of the utility of this structure depends on the use of high-throughput sequencing, where multiple viral genomes are analyzed from a spectrum of clinical presentations. By detecting variants present at low frequencies in the population, as well as the number and type of variants in circulation at the current time, researchers can rapidly detect the emergence of a new variant or its importation into a population, and also detect polymorphic sites.
The current study includes 40 samples from 14 wastewater treatment plants (WWTPs), serving three different areas in Spain. The viral RNA was detected by reverse transcriptase-polymerase chain reaction, targeting the nucleocapsid or N gene, the envelope or E gene, and a region called IP4, on the RNA-dependent RNA polymerase (RdRp) gene.
Subsequently, the researchers sequenced the viral genomes recovered from the samples.
All samples belonged to clade 20A, characterized by the mutations C241T, C3037T, C14408T, and A23403G. However, variants were found in nucleotide positions carrying substitutions that define the virus clades. Two of the samples showed a mix of sequences, of clades 20A and 20C, at the position 25563.
With these two, the second defining clade 20C mutation could not be verified (C1059T), as it was not sequenced in one of the samples. In the second sample, it had low coverage. This made it difficult to verify the presence of the expected mutation here, despite the presence of the mixed sequences.
Substitutions and deletions detected by sequencing
After adjusting for the depth and quality of the readings, they found almost 240 nucleotide substitutions and six deletions, relative to the genome of the reference strain, SARS-CoV-2 isolate Wuhan-Hu-1.
There were over a hundred variant nucleotides in the ORF1a (open reading frame 1a) polyprotein, 67 in ORF1b, 21 in the spike protein, 13 in the membrane protein, 10 in the nucleocapsid gene, and others in other ORFs.
When strains containing these mutations were found in a sample along with sequences containing the nucleotide found in the reference genome, such samples were termed mixed.
With the membrane (M) protein, almost half of all mutations were non-synonymous substitutions, but in ORF7a and ORF10, all the substitutions were non-synonymous.
Of the 21 spike substitutions, 13 were non-synonymous. Ten of these corresponded to already known mutations, while three reflected novel amino acid substitutions. Seven of the 13 have not been reported in genomes from Spanish patients so far, though three amino acid changes have been identified to be found at low frequencies among genomes isolated from Spanish specimens.
Six nucleotide deletions were also reported, of which four were in the ORF1a, and one each in the spike and the ORF3a proteins.
What are the implications?
The study shows the potential importance of sequencing SARS-CoV-2 in sewage, in identifying new mutations and clades of the virus, as well as detecting the viral clades circulating in real-time.
The genomic sequencing of such strains could provide complementary information to the results of clinical laboratory testing. For instance, in the current study, three novel nucleotide substitutions were found in the spike gene in wastewater samples.
Another example is of the mutations found to be at low frequency in genomic reads from clinical specimens, but confirmed in the wastewater genomic sequencing results.
A limitation of the study is the variation in the coverage of the important genomic regions between different samples. This suggests that high-throughput sequencing efforts targeting a specific region, such as the spike protein, or the nucleotide positions that define a clade, would be more valuable in identifying and annotating the genomic variants, either unknown or detected at only low frequencies, within a region or community.
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.