Research suggests two protein coding genes can predict severe COVID-19

In a recent study posted to the medRxiv* pre-print server, researchers demonstrated that two genes, GTPase, IMAP Family Member 7 (GIMAP7), and sphingosine-1-phosphate receptor 2 (S1PR2), have the potential to predict severe coronavirus disease 2019 (COVID-19) with ~90% accuracy.

Study: Transcriptomics Meta-Analysis Predicts Two Robust Human Biomarkers for Severe Infection with SARS-CoV-2. Image Credit: Marcin Janiec / ShutterstockStudy: Transcriptomics Meta-Analysis Predicts Two Robust Human Biomarkers for Severe Infection with SARS-CoV-2. Image Credit: Marcin Janiec / Shutterstock


The host response to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been diverse; accordingly, the demand for biomarkers associated with COVID-19 disease severity has persistently grown. Multiple studies have shown that various factors contribute to the observed differences in COVID-19 severity by evaluating the associations between disease severity and different aspects of the adaptive immune system. However, there is a lack of studies exploring transcriptional biomarkers associated with mild versus severe COVID-19.

About the study

In the present study, researchers performed a meta-analysis of the available human transcriptomics data to identify transcriptional prognostic markers to inform decisions regarding the care of SARS-CoV-2-infected patients under treatment at hospitals. The team searched relevant datasets based on three pre-defined criteria, as follows:

 i) the host was a human;

ii) data was generated by ribonucleic acid- sequencing (RNA-seq) experiments;

iii) the blood samples or human peripheral blood mononuclear cells (PBMCs) were collected from patients during the acute phase of SARS-CoV-2 infection and had associated COVID-19 severity metadata.

They obtained 358 public human transcriptome samples from three independent RNA-seq studies of the gene expression omnibus (GEO) database. Further, the researchers subjected these samples to a specialized data processing workflow termed Automated Reproducible MOdular Workflow for preprocessing and differential analysis of RNA-seq data (ARMOR) to quantify the gene expression in each patient. This process used Salmon to map the reads to the human genome reference consortium human build 38 (GRCh38) transcriptome.

Likewise, they used edgeR to calculate differential gene expression (DGE) from the read counts; finally, they used Camera to calculate gene ontology (GO) terms from the list of gene identifiers produced by edgeR. A z-score transformation normalized the Salmon counts for each gene in each GEO sample.

Lastly, the team trained a machine-learning algorithm to the read counts data to pinpoint the genes that could best segregate the patient samples based on COVID-19 severity and produce a list of genes based on their Gini impurity values that measure entropy. The transcripts from genes with larger Gini Impurity values represented genes that could most accurately predict the COVID-19 phenotype.

Study findings

The study samples were assigned high or low severity and processed to obtain quality trimmed reads mapped to the human transcriptome to compute DEG levels. Overall, the authors identified 8,176 significant DEGs, of which the most significant ones were aspartate beta-hydroxylase (ASPH), chromosome 5 open reading frame 30 (C5orf30), diacylglycerol kinase eta (DGKH), and solute carrier family 26 (SLC26A6).

The GO enrichment yielded 90 significant GO terms, including apoptosis, immune response, and I-kappaB kinase/NF-kappaB signaling. Further, the authors evaluated intracellular signaling pathways best represented by these DEGs using the signaling pathway impact analysis algorithm. The analysis showed nine signaling pathways significantly affected by severe COVID-19. Of these nine pathways, five were directly associated with T-cell receptor (TCR) signaling, while a sixth described a zeta-chain-associated protein kinase 70 (Zap70) immunological synapse; notably, all six pathways remained inhibited during severe COVID-19.

The team constructed a table with all the transcripts from each gene and the read mapping data was represented as tables and rows, respectively, to generate a receiver-operator characteristic (ROC) curve. The authors noted an area under the curve (AUC) of 96.6% across all the transcripts, indicating that the host transcriptional response contributed to COVID-19 severity.

The AUC for six genes with the highest Gini Impurity values was 94.3%. The analysis quantified the combined AUC of 89.8% for the top two DEGs, GIMAP7 and S1PR2. Moreover, the mean and median read counts for both these genes were around three times higher in the samples with low COVID-19 severity.


An earlier study on transcriptional biomarkers for SARS-CoV-2 identified the GIMAP7 gene; however, it did not rank it as a top biomarker. The current study approach allowed the researchers to detect the directionality of GIMAP7 and S1PR2 genes. Moreover, the study results illustrated up and down gene regulation that best differentiated each COVID-19 patient in a more diverse population.

Future studies should investigate whether these biomarkers are still consistent predictors of infection severity in patients infected with more recent SARS-CoV-2 variants, such as Omicron. Also, additional experiments are required to confirm whether it is possible to replicate the study findings in samples from patients of different ages and risk groups. Nevertheless, the study devised a prognostic assay that could contribute to efforts of triaging patients at higher risk of developing severe COVID-19 and help decrease the burden on hospital resources globally.

*Important notice

medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
Neha Mathur

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Mathur, Neha. (2022, June 08). Research suggests two protein coding genes can predict severe COVID-19. News-Medical. Retrieved on January 27, 2023 from

  • MLA

    Mathur, Neha. "Research suggests two protein coding genes can predict severe COVID-19". News-Medical. 27 January 2023. <>.

  • Chicago

    Mathur, Neha. "Research suggests two protein coding genes can predict severe COVID-19". News-Medical. (accessed January 27, 2023).

  • Harvard

    Mathur, Neha. 2022. Research suggests two protein coding genes can predict severe COVID-19. News-Medical, viewed 27 January 2023,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
You might also like...
Researchers estimate COVID-19 Omicron variant mortality in Denmark