In a recent study published in Nature Medicine, researchers identified PASC [post-acute sequelae of coronavirus disease 2019 (COVID-19)] sub-phenotypes depending on conditions diagnosed within 1 to 3 months of acute infection by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
Studies have examined PASC conditions separately without providing evidence of co-occurring conditions. The sun-phenotypes or co-incident patterns, the degree to which PASC conditions and symptoms are co-incident or disproportionately developed among particular patients, could probably aid in revealing PASC pathophysiology.
About the study
In the present study, researchers identified PASC sub-phenotypes by a data-driven approach based on machine learning.
EHR (electronic health record) data of two big CRNs (clinical research networks) from the nationwide PCORnet (patient-centred CRN), i.e., the INSIGHT CRN and the OneFlorida+ CRN. The INSIGHT CRN comprises 12 million NYC (New York City) residents, whereas the OneFlorida+ CRN comprises 19 million individuals residing in Georgia, Alabama, and Georgia.
The INSIGHT and OneFlorida+ CRN individuals comprised the developmental cohort (n=20,881) and validation cohort (n=13,724), respectively. The study comprised SARS-CoV-2-positive individuals, for whom conditions developed between 30 days and 180 days of reported COVID-19 diagnosis were assessed.
COVID-19 diagnosis was based on positive SARS-CoV-2 antigen test or nucleic acid amplification test reports between March 2020 and November 2021. Incidence for 137 probable PASC condition CCSR (clinical classifications software refined) categories, defined by the ICD-10 (International Classification of Diseases, 10th revision) codes, was assessed.
The TM (topic modeling) approach was used to identify co-incident patterns of the PASC conditions, depending on which PASC sub-phenotypes were determined. After obtaining high-dimensional binary representations of PASC conditions (step 1), the algorithm learned PASC topics (T) (step 2) and inferred the patient representations in the low-dimensional PASC topic space (step 3) via the topic-modelling approach. PASC sub-phenotypes were determined based on patient clusters representing PASC topics (step 4).
PASC co-incidence patterns of SARS-CoV-2-positive and SARS-CoV-2-negative individuals were compared based on the generated heat maps, and the entropy of every topic vector was calculated. The robustness of the identified PASC sub-phenotypes was evaluated based on propensity score (PS) adjustments. Further, the team quantitatively compared the topics. The original set of topics learned from the 137 PASC conditions with cosine similarity and similar topics learned from the two CRN cohorts were quantitatively evaluated.
Four PASC sub-phenotypes were identified. Sub-phenotype 1 comprised 7,047 (34%) patients and was predominated by renal-associated, circulation-associated, and cardiac-associated illnesses (T-3, 8, 10), such as kidney failure, circulatory and cardiac disorders, and fluid and electrolyte imbalance. The median patient age was 65 years, and 49% of them were men. The patients had high acute COVID-19 severity [hospitalization (61%), mechanical ventilator needs (5.0%), and critical care admissions (10%).
The sub-phenotype had the greatest percentage of SARS-CoV-2-positive patients (37%) during the initial COVID-19 wave (between March and June 2020). The sub-phenotype individuals had an elevated burden of comorbidities and were largely prescribed for anemia, circulatory disorders, and endocrine disorders.
Sub-phenotype 2 was dominated by sleep, anxiety, and respiratory disorders. The sub-phenotype comprised 6,838 (33%) patients and was predominated by pulmonary disorders (T-4,7,9), anxiety, sleep disorders, chest pain, and headaches. The median age of the patients was 51 years, and 63% of them were female, with 31% acute COVID-19 hospitalizations.
The sub-phenotype had the greatest fraction (65%) of patients diagnosed with COVID-19 between November 2020 and November 2021. Sub-phenotype 2 individuals were largely prescribed anti-allergy, anti-inflammatory, and anti-asthma medications, such as inhaled steroids, montelukast, and levalbuterol.
Sub-phenotype 3 comprised 23% (n=4,879) of individuals with disorders of the nervous and musculoskeletal systems (T-1,5,6), including pain of musculoskeletal origin, sleep disorders, and headaches. The median patient age was 57 years, and 61% of them were female. The sub-phenotype comprised the greatest percentage of individuals with >5.0 outpatient setting visits before COVID-19 (78%). The sub-phenotype individuals were mostly prescribed with analgesic medications (such as ketorolac and ibuprofen).
Sub-phenotype 4 comprised 10% (n=2,117) of individuals with mainly respiratory and digestive disorders (T-2, 4, 8). The median patient age was 54 years, and 62% of them were female, with the greatest rates for zero visits to emergency departments (57.0%) and the least mechanical ventilator use rates (one percent) and admissions to critical care units (three percent) during acute COVID-19. The sub-phenotype individuals were largely prescribed digestive system disorder medications.
The topics learned from SARS-CoV-2-negative individuals showed greater entropy values than SARS-CoV-2-positive patients. Cosine similarity findings confirmed the robustness of the PASC sub-phenotype classification, and the patterns of co-incidence observed for the two CRN cohorts were similar for SARS-CoV-2-positive individuals. On the contrary, the topics for uninfected individuals were dissimilar to those learned from SARS-CoV-2-positive individuals with lesser concentration patterns.
Overall, the study findings highlighted four reproducible data-driven PASC sub-phenotypes identified by machine learning. The findings could aid health authorities in improving PASC management.