AI's ability to detect COVID-19 from coughs faces real-world challenges

A recent Nature Machine Intelligence study investigated the efficacy of audio-based artificial intelligence (AI) classifiers in predicting severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) infection status. SARS-CoV-2 is the causal organism of the coronavirus disease 2019 (COVID-19) pandemic.

Study: Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers. Image Credit: Aliaksandra Post / ShutterstockStudy: Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers. Image Credit: Aliaksandra Post / Shutterstock

Background

Since SARS-CoV-2 infection could cause both symptomatic and asymptomatic manifestations, it is important to develop accurate tests to avoid general population quarantine. Previous studies have revealed that AI-based classifiers trained with respiratory audio data could identify SARS-CoV-2 status. 

Although these studies indicated the effectiveness of AI-based classifiers, many challenges surfaced while applying them in real-world settings. Some factors that withheld AI-based classifier applications were sampling biases, unvalidated data on participants' COVID-19 status, and delay between infection and audio recording. It is imperative to determine whether the audio biomarkers of COVID-19 are unique to SARS-CoV-2 infection or are inappropriate confounding signals.

About the Study

The current study focussed on determining whether audio-based classifiers can be accurately used for COVID-19 screening. A large-scale polymerase chain reaction (PCR) dataset linked to audio-based COVID-19 screening (ABCS) was used. For this study, participants of the Real-time Assessment of Community Transmission (REACT) program and the National Health Service (NHS) Test-and-Trace (T+T) service were invited. All relevant demographic data was extracted from T+T/REACT records.

Participants were asked to complete survey questions and record four audio clips. For audio recordings, they were asked to read a specific sentence, followed by three successive exhalations, making a "ha" sound. Furthermore, the participants were asked to record forced coughs once and three times in succession. All recordings were documented in .wav format. The quality of the audio recordings was assessed, and 5,157 records were removed for quality-related issues.

Human figures represent study participants and their corresponding COVID-19 infection status, with the different colours portraying different demographic or symptomatic features. When participants are randomly split into training and test sets, the randomized split models perform well at COVID-19 detection, achieving AUCs in excess of 0.8; however, matched test set performance is seen to drop to estimated AUC between 0.60 and 0.65, with an AUC of 0.5 representing random classification. Inflated classification performance is also seen in engineered out of distribution test sets such as: the designed test set, in which a select set of demographic groups appear solely in the testing set, and the longitudinal test set, in which there is no overlap in the time of submission between train and test instances. The 95% confidence intervals calculated via the normal approximation method are shown, along with the corresponding n numbers of the train and test sets.Human figures represent study participants and their corresponding COVID-19 infection status, with the different colours portraying different demographic or symptomatic features. When participants are randomly split into training and test sets, the randomized split models perform well at COVID-19 detection, achieving AUCs in excess of 0.8; however, matched test set performance is seen to drop to estimated AUC between 0.60 and 0.65, with an AUC of 0.5 representing random classification. Inflated classification performance is also seen in engineered out of distribution test sets such as: the designed test set, in which a select set of demographic groups appear solely in the testing set, and the longitudinal test set, in which there is no overlap in the time of submission between train and test instances. The 95% confidence intervals calculated via the normal approximation method are shown, along with the corresponding n numbers of the train and test sets.

Study Findings

In this study, a respiratory acoustic dataset of 67,842 individuals was collected. Among them, 23,514 individuals tested positive for COVID-19. All data were linked with PCR test results. It must be noted that the most significant number of COVID-19-negative participants were recruited from six REACT rounds compared to the T+T channel.

The dataset considered in this study exhibited promising coverage across England. No significant association between geographical location and COVID-19 status was noted. The highest level of COVID-19 imbalance was found in Cornwall. A previous study indicated recruitment bias in ABCS, particularly linked with age, language, and gender, in both training data and test sets. Despite this bias, the training dataset was balanced in accordance with age and gender across COVID-positive and COVID-negative subgroups. 

Consistent with previous studies, the unadjusted analysis conducted in this study exhibited that AI classifiers can predict COVID-19 status with high accuracy. However, when measured confounders were matched, a weak performance of AI classifiers in detecting SARS-CoV-2 status was observed.

Based on the findings, the current study proposed some guidelines to rectify recruitment bias's effect for future studies. Some of the recommendations are listed below:

  1. Audio samples stored in repositories must include details of the study recruitment criteria. In addition, relevant information about the individuals, including their gender, age, time of COVID-19 test, SARS-CoV-2 symptoms, and locations, must be documented along with the audio recording.
  2. All confounding factors must be identified and matched to help control recruitment bias.
  3. Experimental design must be developed, keeping the possible bias in mind. In most cases, data matching leads to a reduction in sample size. Observational studies recruit participants focusing on the maximized possibility of matching measured confounders.
  4. The predictive values of the classifiers must be compared with standard protocol findings.
  5. AI classifiers' predictive accuracy must be assessed. However, the predictive accuracy, sensitivity, and specificity vary depending on the targeted population.
  6. The classifiers' utility must be assessed for each testing outcome.
  7. The replication study must be conducted in randomized cohorts. Furthermore, pilot studies must be conducted in real-world settings based on domain-specific utility.

Conclusions

The current study has come with limitations that include the possibility of potential unmeasured confounders across REACT and T+T recruitment channels. For instance, PCR testing for COVID-19 was performed several days after self-screening of symptoms. In contrast, PCR tests in REACT were conducted on a pre-determined date, irrespective of the onset of symptoms. Although the majority of confounders were matched, there is a possibility of the presence of residual predictive variation.

Despite the limitations, this study highlighted the need to develop accurate machine-learning evaluation procedures to obtain unbiased outputs. Furthermore, it revealed that confounding factors are hard to detect and control across many AI applications.

Journal reference:
  • Coppock, H. et al. (2024) Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers. Nature Machine Intelligence. 1-14. DOI: 10.1038/s42256-023-00773-8, https://www.nature.com/articles/s42256-023-00773-8
Dr. Priyom Bose

Written by

Dr. Priyom Bose

Priyom holds a Ph.D. in Plant Biology and Biotechnology from the University of Madras, India. She is an active researcher and an experienced science writer. Priyom has also co-authored several original research articles that have been published in reputed peer-reviewed journals. She is also an avid reader and an amateur photographer.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Bose, Priyom. (2024, February 08). AI's ability to detect COVID-19 from coughs faces real-world challenges. News-Medical. Retrieved on February 26, 2024 from https://www.news-medical.net/news/20240208/AIs-ability-to-detect-COVID-19-from-coughs-faces-real-world-challenges.aspx.

  • MLA

    Bose, Priyom. "AI's ability to detect COVID-19 from coughs faces real-world challenges". News-Medical. 26 February 2024. <https://www.news-medical.net/news/20240208/AIs-ability-to-detect-COVID-19-from-coughs-faces-real-world-challenges.aspx>.

  • Chicago

    Bose, Priyom. "AI's ability to detect COVID-19 from coughs faces real-world challenges". News-Medical. https://www.news-medical.net/news/20240208/AIs-ability-to-detect-COVID-19-from-coughs-faces-real-world-challenges.aspx. (accessed February 26, 2024).

  • Harvard

    Bose, Priyom. 2024. AI's ability to detect COVID-19 from coughs faces real-world challenges. News-Medical, viewed 26 February 2024, https://www.news-medical.net/news/20240208/AIs-ability-to-detect-COVID-19-from-coughs-faces-real-world-challenges.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment
Post
You might also like...
How did COVID-19 impact cancer incidence trends in the US?