Renowned scientists recover deleted SARS-CoV-2 data from Wuhan

Download PDF Copy

Add News Medical on Googleas a preferred source

By Sally Robertson, B.Sc.Reviewed by Dan Hutchins, M.PhilJun 24 2021Revised

Renowned evolutionary researcher, Jesse Bloom from the Fred Hutchinson Cancer Research Center, has conducted a phylogenetic analysis suggesting that the early severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences that were obtained from the Huanan Seafood Market in Wuhan, China, are not fully representative of the viruses circulating in the city at the time of the coronavirus disease 2019 (COVID-19) outbreak.

Bloom’s findings are based on the identification and recovery of a dataset containing SARS-CoV-2 sequences from early on in the Wuhan epidemic that had been deleted from The National Institutes of Health’s Sequence Read Archive.

Bloom says the analysis suggests that the progenitor of known SARS-CoV-2 sequences differs from the Huanan Seafood Market sequences and is at least three mutations closer to SARS-CoV-2’s bat coronavirus relatives.

“The current study suggests that at least in one case, the trusting structures of science have been abused to obscure sequences relevant to the early spread of SARS-CoV-2 in Wuhan,” writes Bloom. “A careful re-evaluation of other archived forms of scientific communication, reporting, and data could shed additional light on the early emergence of the virus.”

A pre-print version of the research paper is available on the bioRxiv* server, while the article undergoes peer-review.

Study: Recovery of deleted deep sequencing data sheds more light on the early Wuhan SARS-CoV-2 epidemic. Image Credit: NIAID / Bloom

This news article was a review of a preliminary scientific report that had not undergone peer-review at the time of publication. Since its initial publication, the scientific report has now been peer reviewed and accepted for publication in a Scientific Journal. Links to the preliminary and peer-reviewed reports are available in the Sources section at the bottom of this article. View Sources

The origin of SARS-CoV-2 remains a mystery

Understanding the spread of SARS-CoV-2 in Wuhan is essential to trace the origin of the virus.

Chinese CDC banned the sharing of information without approval

At around the same time, the Chinese Centers for Disease Control and Prevention (CDC) issued an order forbidding sharing information about the COVID-19 epidemic without approval. China’s State Council then issued a much broader order requiring central approval of any publication related to COVID-19.

In 2021, the joint World Health Organization (WHO)–China report dismissed all reported cases prior to December 8^th 2019, as not COVID-19, and the theory that the virus may have originated at the Huanan Seafood Market was revived.

Although there is much debate surrounding how exactly SARS-CoV-2 infected the human population, it is universally accepted that the virus’s deep ancestors are bat coronaviruses.

The reported collection dates of SARS-CoV-2 sequences in GISAID versus their relative mutational distances from the RaTG13 bat coronavirus outgroup. Mutational distances are relative to the putative progenitor proCoV2 inferred by Kumar et al. (2021). The plot shows sequences in GISAID collected no later than February 28, 2020. Sequences that the joint WHO-China report (WHO 2021) describes as being associated with the Wuhan Seafood Market are plotted with squares. Points are slightly jittered on the y-axis. Go to https://jbloom.github.io/SARS-CoV-2_PRJNA612766/deltadist.html for an interactive version of this plot that enables toggling of the outgroup to RpYN06 and RmYN02, mouseovers to see details for each point including strain name and mutations relative to proCoV2, and adjustment of the y-axis jittering.

However, the earliest known SARS-CoV-2 sequences, which are mostly derived from the Huanan Seafood Market, differ significantly from these bat coronaviruses, compared with other sequences collected at later dates outside of Wuhan.

“As a result, there is a direct conflict between the two major principles used to infer an outbreak’s progenitor: namely that it should be among the earliest sequences, and that it should be most closely related to deeper ancestors,” writes Bloom.

What did the current study involve?

Bloom identified a dataset of SARS-CoV-2 sequences isolated from outpatient samples collected early on in the Wuhan epidemic that had been deleted from the NIH’s Sequence Read Archive. He recovered the files from the Google Cloud and reconstructed partial sequences of 13 early epidemic viruses.

Phylogenetic analysis of these sequences, in conjunction with careful annotation of existing ones, suggested that the early Wuhan sequences from the Huanan Seafood Market that have been the focus of the joint WHO–China report are not fully representative of the viruses that were actually present in Wuhan at the time.

The RaTG13 coronavirus that infects the horseshoe bat (Rhinolophus affinis) has been identified as sharing the greatest genome sequence identity with SARS-CoV-2 to date.

However, the early Huanan Seafood Market sequences are more distant from RaTG13 than sequences collected in January from other locations in China and even other countries.

“All sequences associated with this market differ from RaTG13 by at least three more mutations than sequences subsequently collected at various other locations – a fact that is difficult to reconcile with the idea that the market was the original location of the spread of a bat coronavirus to humans,” writes Bloom.

More about the deleted sequences

Phylogenetic analysis of the deleted sequences revealed that four GISAID (Global Initiative on Sharing Avian Influenza Data) sequences collected in Guangdong that fall within a putative progenitor node were isolated from two different clusters of people who traveled to Wuhan in late December of 2019. These individuals then developed symptoms before or on the day that they returned to Guangdong, where their viruses were ultimately sequenced.

“All sequences from patients infected in Wuhan but sequenced in Guangdong are more similar to the bat coronavirus outgroup than sequences from the Huanan Seafood Market,” writes Bloom.

These deleted data as well as existing sequences from Wuhan-infected patients hospitalized in Guangdong, show that early Wuhan sequences frequently contained the T29095C mutation and were less likely to carry the mutations T8782C and C28144T than sequences in the joint WHO-China report.

Deletion of the data has important implications for future studies

Bloom says the deletion of such an informative data set has implications beyond those gleaned directly from the recovered sequences.

Firstly, samples from early outpatients in Wuhan represent a gold mine for anyone seeking to understand the spread of SARS-CoV-2.

Secondly, genomic epidemiology studies of early SARS-CoV-2 must focus on the provenance and annotation of the underlying sequences as much as they do technical considerations.

In addition, future studies should devote equal effort to going beyond the annotations in GISAID to carefully trace the location of patient infection and sample sequencing, says Bloom.

“In addition, I suggest it could be worthwhile to review e-mail records to identify other SRA [Sequence Read Archive] deletions.”

Journal references:

Preliminary scientific report. Bloom J. Recovery of deleted deep sequencing data sheds more light on the early Wuhan SARS-CoV-2 epidemic. bioRxiv, 2021. doi: https://doi.org/10.1101/2021.06.18.449051, https://www.biorxiv.org/content/10.1101/2021.06.18.449051v1
Peer reviewed and published scientific report. Bloom, Jesse D. 2021. “Recovery of Deleted Deep Sequencing Data Sheds More Light on the Early Wuhan SARS-CoV-2 Epidemic.” Edited by Rasmus Nielsen. Molecular Biology and Evolution, August. https://doi.org/10.1093/molbev/msab246. https://academic.oup.com/mbe/article/38/12/5211/6353034.

Article Revisions

Apr 10 2023 - The preprint preliminary research paper that this article was based upon was accepted for publication in a peer-reviewed Scientific Journal. This article was edited accordingly to include a link to the final peer-reviewed paper, now shown in the sources section.

Posted in: Medical Research News | Disease/Infection News

Comments (0)

Written by

Sally Robertson

Sally first developed an interest in medical communications when she took on the role of Journal Development Editor for BioMed Central (BMC), after having graduated with a degree in biomedical science from Greenwich University.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Robertson, Sally. (2023, April 10). Renowned scientists recover deleted SARS-CoV-2 data from Wuhan. News-Medical. Retrieved on June 27, 2026 from https://www.news-medical.net/news/20210624/Renowned-scientists-recovers-deleted-SARS-CoV-2-data-from-Wuhan.aspx.
MLA
Robertson, Sally. "Renowned scientists recover deleted SARS-CoV-2 data from Wuhan". News-Medical. 27 June 2026. <https://www.news-medical.net/news/20210624/Renowned-scientists-recovers-deleted-SARS-CoV-2-data-from-Wuhan.aspx>.
Chicago
Robertson, Sally. "Renowned scientists recover deleted SARS-CoV-2 data from Wuhan". News-Medical. https://www.news-medical.net/news/20210624/Renowned-scientists-recovers-deleted-SARS-CoV-2-data-from-Wuhan.aspx. (accessed June 27, 2026).
Harvard
Robertson, Sally. 2023. Renowned scientists recover deleted SARS-CoV-2 data from Wuhan. News-Medical, viewed 27 June 2026, https://www.news-medical.net/news/20210624/Renowned-scientists-recovers-deleted-SARS-CoV-2-data-from-Wuhan.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.