By merging causal genetics with network control theory, this study reveals hidden drivers of long COVID, offering new insight into why the condition affects patients so differently.
Study: Integrative multi-omics framework for causal gene discovery in Long COVID. Image credit: Daisy Daisy/Shutterstock.com
The coronavirus disease 2019 (COVID-19) pandemic took a heavy toll on human life and health beginning in 2020. Though the severity of the pandemic has faded, its long-term sequelae continue to plague hundreds of thousands of survivors.
A recent study published in the journal PLoS Computational Biology examines the genes underlying the risk for long COVID, using multi-omics tools.
Long COVID affects millions with diverse lingering symptoms
Post-Acute Sequelae of SARS-CoV-2 Infection (PASC), also known as long COVID, refers to persistent or new symptoms that occur following infection with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It affects up to 20 % of people who contract this infection, even in subclinical form.
However, the reported prevalence varies because of the different definitions issued by various organizations, including the World Health Organization (WHO) and the National Institute for Health and Care Excellence (NICE)
Long COVID symptoms include neurological (brain fog, headache, memory issues), respiratory (breathing difficulty, chest tightness, reduced exercise capability), musculoskeletal (persistent severe tiredness, myalgia, joint pain), cardiovascular (chest pain, fast heartbeat, fluctuating blood pressure), and inflammatory symptoms (swollen lymph nodes, low-grade fever).
Known risk factors for long COVID include sex, age, and the presence of pre-existing disease. However, the genetic underpinnings are unclear, motivating the current study. Such knowledge would help develop more accurate diagnostics and inform future personalized therapies for this widespread condition.
Multi-omics data power a new causal gene framework
The current study used a customized multi-omics platform that combines two analytical methods: one to identify potential genes associated with long COVID, and the other to identify network “driver” genes that exert control over disease-related biological pathways.
The computational platform comprised multiple types of biological data and mathematical methods that together form a comprehensive framework to analyze the genetic causes of long COVID.
The methods used in this integrated approach included:
- Transcriptome-Wide Mendelian Randomization (TWMR) to help find genes with evidence of causal effects on long COVID risk or protection
- Quantitative Expression Trait Loci (eQTLs) for examining genetic variants for their influence on gene expression
- Genome-Wide Association Studies (GWAS) to identify associations between genetic variants and the risk of long COVID
- RNA sequencing (RNA-seq) to study the actual alterations in gene expression in long COVID
- The human Protein-Protein Interaction (PPI) network that explores how proteins interact and identifies key regulatory control points using network control theory
The authors integrated these to form a combined score for each gene:
Final Score=α⋅(TWMR score)+(1−α)⋅(CT score)
Where the parameter α allows users to balance the contribution of direct causal inference versus network controllability.
Study prioritizes 32 genes linked to long COVID
The study identified 32 candidate genes as likely to cause long COVID. Of these, 19 have been reported by earlier researchers, lending support to the current study. Meanwhile, 13 were identified for the first time, and require further study. This array of genes is involved in the host response to the virus, the ability of the virus to cause cancerous changes in cells, and the regulation of the host immune response and cell cycle.
Using enrichment analyses, it became clear that the same set of genes was involved in long COVID, as in autoimmune and connective tissue disorders, and in certain syndromes and metabolic conditions. This explains why the former presents with such diverse symptoms.
The scientists classified the causal genes by their expression profiles to identify three subtypes of long COVID. These had different symptoms, varied underlying disease pathways, and different clinical features.
The researchers developed a free, open-source app on the Shiny framework to enable other users to study, search, and analyze their data freely, using their own filters and parameters. This can be used to generate lists of putative causal genes using either Mendelian randomization or control theory. It also assists in reproducing the findings of the current study.
Combining causality and network biology strengthens discovery
The strengths of this study include the combination of causal inference via MR with network control theory, thereby capturing both the direct effects of causal gene expression and the effects of perturbations at control points on the entire system. Secondly, the use of multi-omics data makes it superior to a study based on only a single type of data.
Moreover, gene discovery was accompanied by the identification of disease subtypes, making it clinically relevant, and the development of an interactive user tool. The Shiny app allows users to find more data by determining how much focus they want on either direct causal genes or the effect of regulatory control on the network.
Targets for future diagnostics and therapies
“This integrative framework highlights novel causal mechanisms and therapeutic targets, advancing precision medicine strategies for Long COVID,” the authors conclude, while emphasizing that these findings provide a foundation for future research.
Download your PDF copy now!