As the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) began its deadly march across the world at the end of 2019, beginning the coronavirus disease 2019 (COVID-19) pandemic, viral containment has been the primary focus along with the treatment of symptomatic illness.
Genomic surveillance has played a starring role in the early identification of emerging variants of concern (VOCs) that threaten to provoke resurgences of infection in areas with some degree of control. These VOCs contain mutations that may enable the virus to escape neutralization by antibodies to other strains or to evade the immune response. Other mutations may increase viral infectivity or transmissibility.
This work is dependent on the pooling of whole-genome sequences from all over the world. A recent study by researchers in Canada shows how mutations in newly emerging lineages of SARS-CoV-2 may disrupt the sequencing process and thus interfere with this important area of research. The team has released their findings as a preprint on the medRxiv* server.
Genomic sequencing is based on the use of amplified viral genetic material, or ribonucleic acid (RNA), from clinical specimens. Amplification is necessary because concentrations of viral RNA in such samples are very low.
The process typically used for amplification is based on amplicon sequencing, using multiple polymerase chain reactions (PCR) run in parallel. Each uses a different set of primers, or RNA bits, that set off the process of RNA synthesis.
Each primer must bind to its complementary region on the viral RNA, rather like the two sides of a zipper meshing together. They then trigger the synthesis of complementary RNA from the template strand.
The primers are used in pairs and staggered across the entire genome, hoping to amplify the entire RNA strand of the virus. However, mutations in the newer strains may destabilize the bond between the primer and the template RNA, impairing their performance.
The resulting gaps in genome coverage reduce the number of high-quality complete SARS-CoV-2 genomes.
What stimulated this research?
For instance, British Columbia Centre for Disease Control’s Public Health Laboratory (BCCDC PHL) found missing pieces across three amplicons that covered parts of the viral ORF (open reading frame) 1a and the spike gene in their sequencing of local P.1 SARS-CoV-2 strains by the Freed amplicon scheme.
In the current study, from British Columbia (BC), Canada, the investigators explored the reasons, focusing on the performance of this scheme that uses amplicons of 1,200 base pairs (bp) to sequence the viral genome. They assumed that these gaps were due to primer failure because of mutations in the P.1 variant.
They analyzed the local P.1 sequences obtained by the BCCDC PHL before April 7, 2021, as well as those uploaded to the worldwide SARS-CoV-2 database called GISAID (Global Initiative for Sharing All Influenza Data). They found that more than 98% of the sequences showed three variants at the primer site.
Interestingly, the primer variants were in the same amplicons that failed to amplify with the Freed primer scheme. These were amplicon 21, 24 and 25.
Primer variants create mismatches with template RNA
In the first one, amplicon 21, two mismatches were found between the primer and the variant RNA, one at the 3’ end, which is important for RNA synthesis. As a result, this amplicon was affected most severely in terms of depth of coverage or number of reads.
The other two amplicons contained single mismatches with the primers, but both were nearer the 5’ end of the RNA strand. This observation rules out the obvious interference with RNA synthesis that is seen with the amplicon 21 mismatch, meaning that other explanations, not based on position alone, must be sought from the empirical observations.
The researchers designed primers to supplement the Freed primers, basing them on the variants they had found during their analysis. These were then added to the Freed primer pools at the same molar concentration, which enhanced the average depth of coverage for amplicon 21 and 25 to the levels achieved with non-P.1 lineage.
Amplicon 25 coverage depth improved, though to a somewhat lesser extent, when the mixture was spiked with a four times higher molar concentration of the new primer.
These results were matched to the same specimens sequenced by the non-spiked Freed primer pool to ensure that the spike-in of primers allowed other amplicons to be sequenced with the same depth of coverage. The same was true of non-P.1 lineages.
Widespread primer mutations
The researchers then explored the effect of mutations in all other currently circulating VOCs, including B.1.1.7, B.1.351, P.1, and B.1.617, using both the Freed primer pool and the ARTIC (v.3) primer pool. They discovered that both schemes showed impaired performance due to no less than 46 variants at primer sites.
Of these, 34 affected the ARTIC, and 12 the Freed protocol. The larger effect on the former is probably because it uses smaller amplicons, which means more primers are needed overall to amplify the whole genome. This is more prone to mutation-induced disruptions of sequencing.
Longer amplicons reduce the effect of individual primer site mutations, but if one is affected, a larger stretch of the genome is affected. Thus, the pros and cons of a primer scheme must be kept in mind when amplifying bits of viral RNA.
There were five variants affecting the sequencing of the B.1.1.7 lineage, nine for the B.1.351 VOC, while both the B.1.617+ and P.1 lineages showed 16 primer variants. Many of these variants dominated their lineage, as shown by the fact that with 12 of them, 90% of the sequenced isolates for that lineage contained the variant of interest.
What are the implications?
The study shows the crucial nature of the effects produced by mutations at the primer site during an amplification protocol. This means that newer lineages must be tracked to identify such mutations.
As part of this process, the actual effects of the primer variants must be validated on the basis of laboratory data.
By the use of an already available bioinformatics program, the researchers were able to discover the presence of three mutations that are present in a substantial proportion of the P.1 lineage. They were also able to partially correct the gaps in the depth of sequence coverage by spiking in designer primers.
Using the same PCR_strainer program, they also found a large number of primer variants that disrupt these commonly used sequencing protocols and are found in many circulating VOCs.
The team concludes:
Our results suggest that extensive updates for widely used amplicon sequencing schemes are necessary immediately, and that primer schemes will have to evolve alongside SARS-CoV-2. In the long-term, our combination of PCR_strainer analysis and laboratory validation provides a useful approach for maintaining SARS-CoV-2 clinical sequencing protocols.”
medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.