The outbreak of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is the virus responsible for the coronavirus disease 2019 (COVID-19) pandemic, has cost the lives of millions of people throughout the world and continues to be a major global threat. The pandemic put a considerable amount of pressure on both hospitals and health facilities, as limited resources were available to meet the growing demand of patients infected with this novel virus.
Under these circumstances, clinical decision support systems that are based on predictive analysis can be useful in managing the emergency. For instance, early detection of COVID-19 in patients who are more likely to experience critical illness and death can help in providing suitable care, as well as optimize the use of the limited resources.
Study: Early outcome detection for COVID‑19 patients. Image Credit: sasirin pamai / Shutterstock.com
Clinical features of COVID-19
COVID-19 is a respiratory disease that has a wide range of clinical presentations and severity. Whereas some infected patients may be asymptomatic or show mild symptoms, others may develop acute respiratory distress syndrome (ARDS) that is sometimes followed by several complications including kidney, cardiac, gastrointestinal, thrombotic, and neurological effects.
Several clinical studies have assisted in the characterization of COVID-19 in different patient cohorts by identifying risk factors and comorbidities for the disease, as well as assessing the efficacy of different therapeutic approaches that are being implemented throughout the world. Although several studies are being carried out, the detailed mechanism of the disease is still not fully understood. Therefore, the evaluation of risk at the time of hospitalization is difficult to perform, especially in patients with several risk factors.
Several vaccines are available against COVID-19; however, many researchers believe that most people in low-income countries will not likely be vaccinated against COVID-19 until at least the end of 2022. Additionally, the emergence of SARS-CoV-2 variants has also threatened the efficacy of these vaccines. Thus, it can be concluded that the best treatment against COVID-19 has yet to be established.
Artificial-intelligence-based technologies and predictive models can be employed for assisting with a risk evaluation. These include the automatic analysis of lung X-Ray and computed tomography (CT) scans to assist in the diagnostic and prognostic processes of COVID-19. Machine learning has also been employed to distinguish between COVID-19 and other forms of pneumonia. Prognostic studies based on clinical data are also being used; however, they require further study since most of the models are still not mature enough.
A new Nature study aimed to predict clinical data based on clinical outcomes. The current study had two main objectives, the first was to determine clinical variables that could predict final clinical outcomes and thus be useful in clinical decision making. The second objective was to build predictive models that could identify critical patients during hospitalization.
About the study
The current study involved two datasets of patients who were admitted into three different units of the Pisa University Hospital in Italy during the first and second waves of the COVID-19 pandemic. The data was curated manually, as well as from electronic records obtained from the three units.
The dataset contained over 125 variables, out of which six clinical predictive variables were selected through the use of a hybrid filter/wrapper feature that was based on a genetic algorithm. The six selected variables included troponin levels, age, blood urea nitrogen (BUN) levels, P/F ratio, presence of myalgia, and the presence of chronic obstructive pulmonary disease (COPD).
These variables were used to build predictive models that were based on logistic regression (LR), decision trees (DT), random forests (RF), naive bayes (NB), and support vector machines (SVM). The performance of the predictive models was evaluated based on their accuracy and F1 score. Cross-validation was performed on all patients who had those variables measured.
The study methods were also compared to standard filter feature selection using recursive feature elimination (RFE). Finally, clustering of all the clinical variables was done to provide a better view of the selected biomarkers.
The results of the study indicated that the six variables selected by the general algorithm provided strong support in the medical literature on COVID-19. The P/F ratio, for example, is an important clinical variable that helps in clinical decision-making of requirements for external ventilation and oxygenation of patients.
Higher age is also an important risk factor that has been correctly selected by the algorithm. COPD is also an important variable that helps to identify patients with chronic lung disease.
The troponin variable helps to recognize the link between cardiovascular disease and COVID-19, whereas the BUN variable helps to identify any pre-existing chronic kidney disease. The last variable of myalgia suggests that, depending upon the case, the disease can develop against different target organs.
The standard feature selection procedure involved two different coverage thresholds of 90% and 75% patient coverage. For the 90% threshold, it was observed that the top variables of P/F and age were the same as the ones selected by the general algorithm. It was also observed that troponin and BUN levels were not selected, as they did not fall under the 90%patient coverage threshold.
The results also indicated that four out of the five predictive models included in the study provided a classification accuracy of over 85% during the first wave of the pandemic. Moreover, the best result was shown by LR followed by RF, DT, and SVM. During the second wave, only DT showed an accuracy level of 85%, while the others showed much lower accuracy levels.
Many clinical variables were found to be missing for the first wave datasets. The implication of the missing values allowed the authors to test the performance of the predictive models, as well as the validity of the selected variables.
For selected variables, no difference was observed between the real and imputed values. For the predictive models, it was observed that LR and RF showed a slight reduction in accuracy, while SVM and NB experienced a slight increase.
Among the five predictive models used in the study, only the DT and LR models were considered interpretable. The LR model could be interpreted by studying the regression coefficients, while the DT model provides a visual description of the dataset that is based on the clinical variables. The DT model also helps to classify a new patient easily.
Finally, clustering of the variables determined that the six selected variables belonged to five different clusters, which also helped to determine the other diseases that had common clinical symptoms. This could further help determine the link between those diseases and COVID-19.
Although the researchers of the current study made considerable efforts to follow suitable external validation practices, one major limitation is that this study included data from only one hospital. Thus, the current model needs to be applied to larger datasets and at different locations so that they can help in clinical decision-making at such a time of emergency.