In a recent study published in The Lancet Regional Health-Southeast Asia, researchers developed an artificial intelligence (AI)-based predictive system (AIPS) model for the early detection of lung cancer by combining radiological, clinical, and genetic data.
Background
Lung cancer prognosis has improved due to molecular targets and targeted treatments, including therapy for somatic epidermal growth factor receptor (EGFR) mutations. However, next-generation sequencing (NGS) is unavailable in resource-constrained nations like India. An AI-based pipeline is required to recognize lung nodule characteristics in computed tomography (CT) scans and predict EGFR mutation probability, allowing for near-optimal care and therapy.
About the study
In the present study, researchers demonstrated the APIS pipeline for identifying and predicting the presence of EGFR-mutant lung nodules from CT scans in resource-constrained situations such as India.
Using EGFR sequencing and CT imaging data from 2,277 lung cancer patients, the researchers created an automated lung nodule identification and characterization algorithm. They concentrated on Indians (Cohorts 1, 2, and 3), contrary to prior research, which mainly focused on individuals of white and Chinese descent.
The team collected data on the Indian population from the Rajiv Gandhi Cancer Institute and Research Centre, India, and the white population (Cohort 4) data from The Cancer Imaging Archive (TCIA). They created the AI prediction system-Nodule model (AIPS-N) by obtaining the Lung Image Database Consortium image collection (LIDC-IDRI) computed tomography imaging data (Cohort 5), which included 1,010 patients (denoted by 244,527 images).
The researchers pre-processed CT scans with windowing techniques to improve lung visibility, then interpreted picture annotations and used automatic lung segmentations to detect and segregate the lung region in the image. They trained the faster region-based convolutional neural network (R-CNN) model with the parsed image annotations and the corresponding pre-processed images.
The team preprocessed LIDC-IDRI images to assess the anatomy and identify pathologies. They normalized the DICOM intensity values to a specified range before using windowing techniques to achieve a consistent intensity range across images from various cohorts. They passed the training dataset, which included masks, image annotations, and slices, into the pre-trained basic model, which extracted features from input images and generated high-level multi-scale convolutional-type feature maps.
The researchers used the Region Proposal Network (RPN) on multiple-scale convolutional feature maps to anticipate region proposals and probable locations in the picture that may include items of interest. Anchor deltas were prefabricated bounding boxes with varied aspect ratios and sizes placed onto the convolutional feature maps to help improve the anchor boxes and align them more closely with image objects.
The AIPS-Nodule model used a yellow box to visually emphasize and enclose the lung region of interest in CT scans, allowing for additional investigation. The automated pulmonary segmentation and object identification models were built separately for each nodule feature, yielding five AIPS-N feature models. The researchers used basic models such as the ResNet101-Feature Pyramid Network (R101-FPN) and Faster region-based CNN to extract image features and train the model. They used 70%, 15%, and 15% of the data for model training, validation, and testing, respectively.
Results
The deep learning model could substitute machine learning (ML) models by adding AIPS-N results, which enhanced ML performance. The AIPS-Nodule model could automatically detect and characterize lung nodules, with a mean AP50 score of 70%. The AIPS-M system combined the AIPS-Nodule results with clinical parameters to predict EGFR genotype to yield area under the receiver-operating characteristic curve (AUC) values ranging between 0.6 and 0.9.
Cohort 1-trained ML algorithms had a mean AUC of 0.85 in the validation subgroup. Random Forest, Randomized Search Cross-Validation, XGBoost, and AIPS-Mutation ML models trained on the Indian population and white cohort produced a mean AUC of 0.8 when tested on Cohort 4. In testing Cohort 4, the mean AUC of the ML models trained using the first cohort increased from 0.6 to 0.9, demonstrating the positive impact of integrating AIPS-N ratings on predictive capabilities.
The research included an Indian male smoker aged 71 years diagnosed with squamous cell carcinoma and carrying the EGFR gene mutation. The AIPS-Nodule model accurately identified the features and location of the detected nodule, assigning spiculation and sphericity to class 1. AIPS-Mutation models trained on Cohort 1 were used to determine the EGFR gene status of the male patient, who was clinically determined as mutated. All six machine learning algorithms predicted EGFR status as mutated, yielding a 'True Positive' result.
The study showed that CT imaging paired with AIPS automated pulmonary nodule analysis might predict EGFR gene presence and identify individuals with EGFR mutations cost-effectively and non-invasively.
The AIPS-Nodule model could detect and describe lung nodules, whereas the AIPS-Mutation model predicted EGFR genotype by integrating AIPS-N findings with clinical variables. The study's findings might help oncologists prioritize patients for tailored medicines, improve patient care, and enhance global healthcare standards.
The innovative AIPS approach could benefit resource-limited situations and add to previous evidence by broadening the scope of nodule investigation to include thorough characterization and association with the EGFR mutation.