Comprehensive analysis of more than 2 million SARS-CoV-2 samples detects co-infections and intra-host recombination

NewsGuard 100/100 Score

In a recent study published in Nature Communications, a group of researchers evaluated the prevalence of co-infection and intra-host recombination in over 2 million global Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) cases and developed strategies for accurate identification and analysis of recombinant strains.

Study: Systematic detection of co-infection and intra-host recombination in more than 2 million global SARS-CoV-2 samples. Image Credit: Ninc Vienna/
Study: Systematic detection of co-infection and intra-host recombination in more than 2 million global SARS-CoV-2 samples. Image Credit: Ninc Vienna/


The SARS-CoV-2 virus, known for its high mutation rate, has led to the emergence of numerous variants since the coronavirus disease 2019 (COVID-19) pandemic began. Co-infection, where an individual is infected with multiple variants simultaneously, has been increasingly observed, with an estimated occurrence rate of approximately 0.2-0.6%. These co-infections, varying in severity, have been identified globally, including in mild cases.

Recent trends show predominance of recombinant lineages, like Omicron XBB-variants, suggesting frequent co-infections. Further research is essential to understand better the dynamics of SARS-CoV-2 co-infections and recombinations, which could have significant implications for public health strategies and vaccine development.

About the study

In the study, the CoVEO database was utilized for prefiltering and initial selection of co-infection samples. This PostgreSQL database stores mutational data from SARS-CoV-2 sequencing samples uploaded to the European COVID-19 Data Portal.

For quality filtering, from the 3,093,454 human host samples in the CoVEO database, those with a total base count below 100,000 or with insufficient sequencing depth were excluded. This left 2,172,927 samples, which were then scrutinized for unique defining mutations of SARS-CoV-2 viral variants. Unlike relying on precompiled lists, this study employed a more refined approach, using a marker table to identify mutations highly indicative of specific strains.

The identification of candidate co-infection samples was based on the detection of at least 50% of the unique variant-defining mutations from at least two different strains. No allele-frequency filtering was applied at this stage, allowing for the recognition of even minor variant strains. This process yielded 29,666 potential co-infection samples.

The final selection of co-infection samples involved filtering candidates that carried a significant proportion of these mutually exclusive mutations. A stringent threshold ensured that only samples with comprehensive genomic representation of the variants were included, resulting in 7700 co-infection samples.

The study also compared the collection dates of these samples with the prevalence of different variants to validate their natural occurrence. Additionally, the distribution of co-infection samples was examined across various studies and locations to identify potential lab contamination.

In exploring genetic diversity, metrics like the number of concurrent variants, information entropy, and cumulative genetic diversity were utilized. These measures were correlated with co-infection rates, providing insights into the association between viral diversity and co-infections.

For detecting intra-host recombinants, the study employed two distinct methodologies. First, a pipeline was developed to identify potential recombinant samples based on shifts in allele frequencies (AFs) at defining mutations. This involved correcting AFs for systematic biases and calculating odds-ratios to distinguish between samples with and without potential breakpoints. This analysis, focusing on co-infection samples with exactly two variant strains, aimed to detect shifts in AFs that could indicate recombination events.

The study utilized an independent approach by examining short reads from sequencing data. This involved querying for reads that overlapped defining mutations of both parental strains in a sample. Reads were rigorously filtered for quality, and those carrying both sets of mutations were considered indicative of recombination.

The analysis identified genomic positions with sufficient evidence of recombination in multiple samples as potential recombination hotspots. These regions were then compared with known recombination breakpoints in other databases to corroborate findings. Additionally, recombinant reads were examined for traces of subgenomic ribonucleic acid (RNA), further validating their origin.

Study results 

In the CoVEO database, co-infection samples were determined based on the presence of a significant proportion of mutually exclusive variant-defining mutations from at least two different viral strains. The number of identified co-infection samples varied with the threshold set for these mutations. A stringent threshold of 80% led to the identification of 7700 co-infection samples from over 2.1 million good-quality human host samples, reflecting a co-infection rate of 0.35%. This rate aligns with previous studies.

The most common co-infections involved combinations of Delta and Omicron (BA.1), Alpha and Iota, and other variants. The prevalence of co-infection corresponded with the number of samples assigned to each variant in the database, suggesting a correlation with the geographical and temporal distribution of variants.

To ensure the reliability of these findings, the study also examined the possibility of lab contamination. While some studies displayed higher than average co-infection rates, the rigorous detection method used in this study suggests that the 0.35% co-infection rate is a conservative estimate. The distribution of co-infection samples across various studies reduced the likelihood of contamination influencing the overall results.

Geographically, co-infection samples were distributed fairly evenly, although variations due to different sequencing capacities and strategies were noted. For instance, France showed a higher prevalence, but this was influenced by a lower total number of samples and specific studies focusing on co-infections. In countries with a larger number of samples, co-infection rates varied, highlighting the need for systematic global surveillance. The timeline of co-infection cases in countries with over 1000 good-quality samples indicated that co-infections were more likely when multiple variants were circulating concurrently.

The study also investigated the presence of intra-host recombination in co-infection samples by examining shifts in Afs of defining mutations. However, polymerase chain reaction (PCR) primer bias often affected the accuracy of measured Afs. Despite this, the analysis identified 13 potential recombinant samples, predominantly from Delta – Omicron (BA.1) co-infections. Some samples from studies with artificial mixtures of variants were also identified as recombinants, which raised questions about the detection methodology.

Another method involved examining raw sequencing data for reads carrying mutations from multiple strains. A significant correlation was found between the density of these mutations and the presence of overlapping reads. Regions of the genome that showed signs of recombination did not always match those identified from other databases.

In investigating non-artificial samples, certain genomic regions were identified as potential recombination hotspots, particularly within the S and M genes of the Delta – Omicron (BA.1) co-infections. However, in the 13 samples initially flagged for potential recombination, no conclusive evidence of recombinant reads at the presumed breakpoints was found, highlighting the complexity and challenges in detecting intra-host recombination in SARS-CoV-2. 

The geographical distribution of co-infection cases, though broadly consistent, showed variations influenced by local sampling strategies and sequencing capabilities. This unevenness underlines the importance of continuous and systematic worldwide surveillance to capture the dynamics of viral co-infections accurately.

In investigating intra-host recombinations, the study navigated through the complexities of AF distributions and raw sequencing data analysis. The identification of potential recombinants, especially in samples involving common variant combinations like Delta and Omicron, provides intriguing insights.

Ultimately, the study's approach to analyzing over 2 million samples for co-infections and recombinants in the SARS-CoV-2 virus offers a robust framework that balances stringent detection criteria with the practical challenges of large-scale genomic data analysis. 

Journal reference:
  • Pipek, O.A., Medgyes-Horváth, A., Stéger, J. et al. Systematic detection of co-infection and intra-host recombination in more than 2 million global SARS-CoV-2 samples. Nat Commun 15, 517 (2024). doi:
Vijay Kumar Malesu

Written by

Vijay Kumar Malesu

Vijay holds a Ph.D. in Biotechnology and possesses a deep passion for microbiology. His academic journey has allowed him to delve deeper into understanding the intricate world of microorganisms. Through his research and studies, he has gained expertise in various aspects of microbiology, which includes microbial genetics, microbial physiology, and microbial ecology. Vijay has six years of scientific research experience at renowned research institutes such as the Indian Council for Agricultural Research and KIIT University. He has worked on diverse projects in microbiology, biopolymers, and drug delivery. His contributions to these areas have provided him with a comprehensive understanding of the subject matter and the ability to tackle complex research challenges.    


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Kumar Malesu, Vijay. (2024, January 17). Comprehensive analysis of more than 2 million SARS-CoV-2 samples detects co-infections and intra-host recombination. News-Medical. Retrieved on April 20, 2024 from

  • MLA

    Kumar Malesu, Vijay. "Comprehensive analysis of more than 2 million SARS-CoV-2 samples detects co-infections and intra-host recombination". News-Medical. 20 April 2024. <>.

  • Chicago

    Kumar Malesu, Vijay. "Comprehensive analysis of more than 2 million SARS-CoV-2 samples detects co-infections and intra-host recombination". News-Medical. (accessed April 20, 2024).

  • Harvard

    Kumar Malesu, Vijay. 2024. Comprehensive analysis of more than 2 million SARS-CoV-2 samples detects co-infections and intra-host recombination. News-Medical, viewed 20 April 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
New vaccine promises broad protection against SARS-CoV-2 and other sarbecoviruses