A new study published on the preprint server bioRxiv* in May 2020 reports the setting up of a system to help track new mutations in the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that is behind the current COVID-19 pandemic. This will help analyze mutations to find and track the spread of those that are most likely to increase the pathogenicity of the virus, either by making it easier to transmit or by enhancing its resistance to treatment.
Novel Coronavirus SARS-CoV-2 Colorized scanning electron micrograph of a VERO E6 cell (purple) exhibiting elongated cell projections and signs of apoptosis, after infection with SARS-COV-2 virus particles (pink), which were isolated from a patient sample. Image captured at the NIAID Integrated Research Facility (IRF) in Fort Detrick, Maryland. Credit: NIAID
Mutations in SARS-CoV-2
The current pandemic is the third major outbreak in recent decades to have been caused by zoonotic betacoronaviruses, jumping from bats or other wild animal hosts to humans to cause respiratory illness. The COVID-19 pandemic was first reported in Wuhan, China, as a locally spreading epidemic disease that was rapidly carried over the world by global travelers. As of now, the virus has caused 3.5 million confirmed infections while killing almost 247,000 patients.
During the rapid spread of the virus across continents, it has developed several mutations. This process, called antigenic drift, is the accumulation of mutations over a season of infection. This allows natural selection for those variants that are relatively resistant to antibodies among the whole.
The complete lack of immunity to this novel virus, and the high rate of spread, comparable to influenza, could mean the virus will be around for a year or more until the first vaccines are available. At this point, neglecting to identify and track antigenic drift over this long season of infection could mean that the first vaccines to be developed find themselves ill-fitted to induce specific immunity to the virus strain circulating at that time.
Why Is Mutation Tracking Important?
Most vaccines under development are focused on the spike protein, which is responsible for viral attachment to the host cell membrane and viral entry into the cell. They are aimed at stimulating the production of neutralizing antibodies that block infection altogether.
The Spike or S protein is a trimer, made up of 3 monomers, which each comprise an S1 and an S2 domain. These mediate viral binding to the ACE2 receptor on the membrane and membrane fusion, respectively.
Most vaccines and viral testing reagents are based on the genome sequence of the original viral isolate from Wuhan. However, the current study aims at identifying the real-time variations in the viral genome so as to pick up positive selection for S protein variants. Secondly, the scientists attempted to detect the potential role of recombination in the way the pandemic is playing out.
How Does the Pipeline Work?
To accomplish these twin aims, the investigators set up a data pipeline with three parts. The first is an analysis of the S protein data brought in to the GISAID (Global Initiative for Sharing All Influenza Data) platform every day. The second is a structural model of viral sites to help immunologists and vaccine developers visualize the new mutational data, while the third has to do with experiments to validate the analysis and the modeling conclusions.
The real-time tracking pipeline will also help the world community to pick up any change in the number of times a particular mutation is seen in a community. Such a change might show positive selection as well as changes in the outward characteristics of the virus and its chief antigens.
At present, hundreds of genomic sequences are uploaded to the GISAID each day, so the scientists automated the first few steps of critical analyses to get the results automatically in real-time. Sheffield researchers worked in collaboration with scientists at Duke University who are setting up a testing facility for neutralizing antibody pseudoviruses. This will help immensely to identify how specific Spike mutations cause differences in viral recognition by the immune system, or viral characteristics.
The first step is to exclude partial and other unacceptable sequences from all those included in the GISAID platform, and trimming the included sequences until the start of the first open reading frame (ORF) and the end of the final ORF. This is then adjusted for codon alignments, using two separate alignment sequences.
Selecting Mutations for Tracking
The availability of thousands of sequences with geographical and timepoint data allows the early detection of positive selection by looking for which mutations are recurring more often over time. Early indicators that such a process underway includes a) finding the same mutation several times in different areas or in different regions of the phylogenetic tree b) finding sequences in which a particular mutation is known to occur more often than before in the same geographical region over time c) the use of different codons to encode the same amino acid that is recurring in the mutational genome d) tightly clustered mutations by linear or structural sequence.
The scientists began by setting a low threshold to pick up the most significant number of mutations since these are rare in the SARS-CoV-2 spike protein. Any mutation found in 0.3% of sequences was tracked, both by following how it would result in a change in the future development of the virus, and how it affected the virus structure in terms of binding to antibodies, the stability of the spike protein, and the attachment of sugar molecules.
Mutations that are near known significant sites such as the receptor-binding domain or the antibody binding sites of the SARS-CoV virus (which shares almost 80% of the genomic sequence of the current virus) are followed if they are found in just 0.1% of the sequences uploaded.
The new pipeline comes “on” if a mutation fits these criteria, and then begins the process of evaluating the impact of the mutation on the antigenicity of the virus, its infective potential, its sensitivity to neutralizing antibodies and its binding capacity for the ACE2 receptor.
What Do the Early Results Indicate?
As of April 13, 2020, the researchers had identified 14 mutations and one clustered group of mutations. Some of these are not now as common as when they were tracked, but some are persisting at that frequency or increasing. Others show interesting characteristics, such as being present at multiple sites of the phylogenetic tree. The two most important mutations discussed in the current report are the D614G and S943P.
The D614G Mutation
The D614G mutation is rapidly spreading “at an alarming pace,” which means that it has spread out from the original Wuhan strain into Europe, it becomes the leading strain in any new region into which it is introduced within a matter of weeks. This indicates it is more rapidly adaptable to a local environment than the original strain.
Scientists are studying how this mutation might increase the ease of spread by making it easier for the virus to bind to the host cell, to fuse to the membrane, or by inducing the formation of antibody-dependent enhancement (ADE). This is a phenomenon in which the presence of existing antibodies against a different strain of a currently circulating virus actually enhance its entry into and destruction of immune cells, and viral replication, rather than neutralizing it.
The researchers also speculate that, at a later period, if this mutation affects the virus sensitivity to neutralizing antibodies as well, it could mediate escape from antibody inhibition and allow infected and recovered patients to become infected again.
The S943P Mutation
The second S943P mutation is near the fusion region and is found to be spreading by recombination between different strains. This indicates that multiple strains are circulating in the same region. It could enable multiple fitness-enhancing mutations to assemble within the same strain making it more pathogenic than the distinct original strains.
The S943P mutation is also associated with a cluster of mutations at the fusion core of the HR1 region, at the point of breakage of the HR1 helix in the spike protein before the virus fuses with the host cell membrane. This cluster of Ser- and Thr-rich residues forms hydrogen bonds readily and can promote helix formation. This could promote conformational changes more readily in this region.
Importance in Determining Response
The scientists summarize the impact of their work: “Experimentalists can make use of the most current data available to best inform vaccine constructs, reagent tests, and experimental design. The tools we have developed can be extended to other proteins and mutations in subsequent versions of the pipeline. Meanwhile, understanding both how the D614G mutation is overtaking the pandemic and how recombination is impacting the evolution of the virus will be important for informing choices about how best to respond in order to control epidemic spread and resurgence.”
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.