Soon after the novel coronavirus SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) began to spread over the world, leading to the devastating pandemic of coronavirus disease 2019 (COVID-19), the earliest sequences were published. Since then, genomic surveillance of the virus has been an essential tool to keep track of new and possibly more virulent or transmissible variants as and when they emerge and spread.
A new study from the Institute of Bioinformatics and Applied Biotechnology (IBAB), Bengaluru, India, provides an overview of how well this important monitoring mechanism is functioning around the world. The team has released their findings as a preprint on the bioRxiv* server.
Good viral surveillance practices via genomic sequencing depend on the existence of a common open-access platform that makes all the genomes sequenced so far freely available to researchers worldwide.
At the very start, COVID-19 researchers co-opted the influenza virus genome sharing platform GISAID (Global Initiative on Sharing All Influenza Data) to deposit the new SARS-CoV-2 sequences.
This is now the largest open-access platform in use, storing genomic sequences with the clinical and epidemiologic correlates of more than 1.7 million strains of SARS-CoV-2, making it the most intensively studied organism ever.
This facilitated the identification of several new variants, including the B.1.1.7 (Alpha) variant, first identified in the UK; B.1.351 (Beta, seen first in South Africa); B.1.1.28 or P.1 (Gamma; first in Brazil); B.1.617.2 (Delta) and B.1.617.1 (Kappa), both first in India; P.3 (Theta; first in the Philippines); and B.1.427 and B.1.429 (Epsilon; first in the USA).
This platform has helped analyze sequences, identify emerging variants in a timely manner, and provide useful information to governments at risk to shape their policies. In response, there has been a concerted chorus of scientists urging increased sequencing worldwide. However, there is an observable delay in submitting sequences to such portals, hampering their usefulness.
The scientists in this study therefore came up with a measure of this delay, called the Collection to Submission Time Lag (CSTlag) per strain.
Significant delays in uploading sequences
The median/mean CSTlag values vary between countries, from one day to one year (or even more).
Among countries that have submitted a thousand or more genomes, the UK has the least delay (16 days) with approximately 420,000 submitted genomes.
For other European countries, about 590,000 genomes have been deposited with a lag of 25 days. The US is close behind, having contributed almost 500,000 genomes with 26 days delay.
In Asia, Japan took 79 days (median) for over 37,000 genomes. India’s CSTlag was 72 days for approximately 16,000 genomes. Qatar, in the Middle East, has uploaded about 2,200 genomes with a median lag of almost 290 days. Conversely, Singapore has a median lag of 26 days for approximately 2,500 genomes.
In the southern hemisphere, Australia and New Zealand have a lag of 40 and 51 days, respectively, for 17,000 and 1,000 genomes, respectively. South America has uploaded over 18,000 genomes, at 61 days, and Africa 7,000 with a median lag of 50 days.
The scientists also assessed the rate of genome sequencing per total COVID-19 case number and per million population, respectively.
In proportion to the number of cases reported, Iceland has sequenced an impressive 77% of all positive cases, vs. ~60% in Australia. New Zealand and Denmark have sequenced about 40% and 35%, respectively.
The largest number of genomes have come from the USA, as seen above, and the UK. Though India has a very large population of over one billion and has been hit ferociously by the second wave, it has sequenced a mere 0.05% of them.
This fits with the pattern seen over Asia, Africa and South America, where sequencing covers less than 0.1% to 0.4% of cases. Europe has sequenced ~2%, North America 1.4%, but Oceania 37%, of cases.
Population-based sequencing rates
Looking at the rate of sequencing per million population, First World countries in the West (Europe and the USA) lead the pack, along with Israel and Reunion, at over 1,000 per million population. The North American average is 600, vs. 1,000 for Europe, but 600 for Oceania.
In fact, the USA and Japan are the only countries with over 100 million people to have sequencing rates above 100 people per million population. Brazil is the next in this group, at 50, comparable to the whole of South America. Conversely, India shows a meager 11, about half the Asian average of 21, and closer to the African value of 14.
What are the implications?
The CSTlag reflects the strength of local public health infrastructure, mirroring the general running of the public health system. Efficient sample collection and recording of metadata, as well as smooth delivery to the RNA isolation and genome sequencing centers, are thus essential to increase genomic sequencing capabilities.
Secondly, the absence or breakdown of such systems in low-resource or low-efficiency settings is exacerbated by the shortage of biosecurity facilities capable of handling highly infectious pathogens such as COVID-19 or may have only a few, again contributing to delays.
Thirdly, funding is often hit during such situations as a pandemic, with resources being diverted to urgent and essential care. Fourthly, import restrictions on reagents and equipment required for RNA sequencing may further hamper this area of research.
Finally, reliance on possibly outdated and more expensive processes may worsen the delay still more. Many of these factors are known to operate in India, for instance, and will require correction.
An alternative way out may be for institutional-level partnerships covering new ground, rather than relying on local and national governments for infrastructural facilities. This means an inevitable lag before these systems are up and running.
Beyond actual sequencing, uploads are often delayed. “It is likely that far more samples have been sequenced than are represented in GISAID.”
This may be from a wish to keep research secret until papers or patents are ready for publication, an initial lack of understanding of the importance of sequencing, or even because of the pervasive stigma associated with the names of variants called after the countries that first reported them.
Political interference may also have contributed to significant extents, though this is, of course, murky water.
Whatever the cause, a lag in reporting gives the variant time to spread across national borders and even to undergo further mutations and emerge as another strain altogether. In order to dampen this phenomenon, it is crucial to identify and remove these obstructions, sequencing a higher proportion of positive cases and uploading the sequences rapidly to open-access platforms.
The researchers write:
This will enable researchers across the globe to track the evolved variants, their mutations, epidemiology, and biological consequences, which will provide crucial inputs for appropriate and effective public health policies.”
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.