In a recent study published in Scientific Reports, researchers evaluate and discuss the limitations of chest X-rays (CXR) shared through smartphone applications.
Against the backdrop of the coronavirus disease 2019 (COVID-19) pandemic, the researchers highlight the benefits of automated clinical diagnostic tools developed using artificial intelligence (AI) models while also elucidating their demerits, especially when analyzing highly compressed images. Multi-task learning (MTL) was also introduced as an approach to overcome current challenges associated with AI models.
Study: Challenges of AI driven diagnosis of chest X-rays transmitted through smart phones: A case study in COVID-19. Image Credit: ShutterOk / Shutterstock.com
AI in COVID-19 diagnosis
Before the development of clinical diagnostic COVID-19 test kits, CXR was the first-line triage assessment of the disease. However, due to the unprecedented spread of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the limited number of radiologists available worldwide were soon overwhelmed, particularly in low-to-middle-income countries (LMICs) and rural areas.
To address this burden on radiologists, centralized AI-based systems were conceptualized to automate COVID-19 diagnosis from CXR images. Advances in smartphone hardware capabilities and increased smartphone penetration, even in LMICs, made smartphones the ideal medium for implementing these AI models. Recent smartphones include high-resolution and color-sensitive cameras, which research has shown to be sufficient for accurate COVID-19 diagnosis by trained radiologists.
Another benefit of smartphones is their inclusion of media-capable messaging applications, including WhatsApp and Telegram. These messaging platforms allow for sharing images remotely, thus facilitating diagnoses even without a local radiologist. With these smartphone features in mind, several COVID-19 diagnostic AI systems were launched, termed ‘AI-aided Diagnosis of X-ray images through messaging Applications’ (AIDXA).
While AIDXA systems were designed to account for low-bandwidth availability in rural areas, with some systems like the Indian XraySetu capable of directly interfacing with WhatsApp, a limitation of these applications is data loss due to image compression. While having no noticeable effect on expert radiologists’ diagnoses, limited evidence suggests that image compression can significantly alter AI diagnostic performance.
About the study
In the present study, researchers first present a case study to define and illustrate the two main limitations of current AIDXA systems in COVID-19 diagnosis. They then develop an in-house COVID-19 image database to quantitatively evaluate the effects of image compression on AIDXA model performance. Finally, they describe, design, and train a novel multi-task learning model aimed at accurate COVID-19 diagnosis, even under conditions of image compression.
Despite the benefits of AIDXA systems in automating COVID-19 diagnosis, thereby partially addressing the global shortage of expert human radiologists, the current study identifies ‘Prediction Instability’ (PIP) and ‘Out of Lung Saliency’ (OLS) as severe limitations of these AI systems.
To evaluate the current model performance, a novel CXR image dataset called ‘WhatsApp CXR’ (WaCXR) was developed. The dataset comprised 6,562 JPEG CXR images from the COVID-Net database, passed through WhatsApp compression, which resulted in 6,562 pairs of visually near-identical compressed and uncompressed images.
Prediction instability is the lack of congruency in model predictions between compressed and uncompressed CXR images. While a model might identify a patient as COVID-19-positive based on uncompressed CXR images, the same model may classify the patient as COVID-19-negative when the same CXR image has been subjected to WhatsApp compression. This lack of congruence in medical applications represents a potentially fatal flaw, rendering predictions unreliable.
Machine learning research suggests that the high predictive performance of deep learning models can partially be attributed to their unintended learning of shortcut strategies. While useful in some AI applications, this presents a significant challenge in the medical field, where explainable and reproducible predictions are imperative.
The current study employs saliency maps, which are algorithms that identify regions in an image contributing to model predictions, to evaluate current AIDXA models’ pathology predictions. Saliency map results suggest that several state-of-the-art AIDXA models’ COVID-19 predictions are based on CXR image regions outside the lung. This OLS is observed in both uncompressed and compressed images, with OLS exacerbated in the latter.
Although PIP and OLS have been identified as challenges in previous research, no metrics to investigate their impacts have been implemented. To address this need, the researchers introduce ‘PI Score’ and ‘OLS Score’ as quantitative measures of state-of-the-art AIDXA models’ performance.
Given the alarmingly high instability and saliency observed in current AIDXA models, a novel multi-task learning (MTL) model called COVIDMT was developed.
COVIDMT is built on top of a state-of-the-art Deep Learning Network known as a base network. The base network is initialized with Imagenet weights to enable transfer learning, thereby maximizing the performance of COVIDMT model on the target domain”.
PI and OLS Score were used to evaluate the performance of COVIDMT versus current generation AI COVID-19 diagnostic models.
The most widely used deep neural network AIDXA systems currently employed in automated COVID-19 diagnosis are ResNet-50, ResNeXt-50, VGG-19, XceptionNet, and COVID-Net. Each of these models were evaluated for PIP and OLS performance; however, COVID-Net is of particular relevance, as it uses the same training dataset as COVIDMT.
WaCXR dataset preparation reduced file size from 6.7 GB to 351 MB, with a 95% compression factor. While visually almost indistinguishable, this results in significant pixel-level changes and, as a result, AI model input data inconsistencies.
PI Score results indicate instability between 4.36% and 11.71% for current state-of-the-art models. OLS Scores were similarly poor, with an average saliency of 66% for original images and 70% for compressed images. Notably, COVID-Net presented a saliency of 70%, even for uncompressed images, thus highlighting that current AI models are at an increased risk of both instability and saliency.
COVIDMT results depict a 40% improvement in MT model’s PI score as compared to ResNet-50 and ResNeXt-50. OLS Scores were similarly improved by 35% over the corresponding base model.
In future research, it would be interesting to explore the challenges of PIP and OLS in relation to different abnormalities and imaging modalities. Additionally, investigating the potential of a multi-task learning framework to address these issues could be a promising direction for further exploration.”
- Antony, M., Kakileti, S. T., Shah, R., et al. (2023). Challenges of AI driven diagnosis of chest X-rays transmitted through smart phones: A case study in COVID-19. Scientific Reports 13(1); 1-16. doi:10.1038/s41598-023-44653-y