In a recent article published in Scientific Reports, researchers highlighted the importance of tools used to interpret the output of predictive models in type 1 diabetes (T1D) management.
Study: The importance of interpreting machine learning models for blood glucose prediction in diabetes: an analysis using SHAP. Image Credit: Buravleva stock/Shutterstock.com
Introduction
To this end, first, they selected an ad hoc case study focused on a female patient from the OhioT1DM dataset.
Next, they retrospectively “replayed” patient data to find an appropriate prediction algorithm that could be integrated into a decision support system (DSS) to help make corrective insulin boluses (CIB) suggestions.
They performed their experiments on two ad-hoc Long Short Term Memory neural networks (LSTM) models, non-physiological (np)-LSTM and physiological (p)-LSTM, with similar prediction accuracy but the ability to give different clinical decisions.
Since LSTM can learn and maintain long and short-term dependencies from data, they are apt for time series prediction.
While both np-LSTM and p-LSTM relied on the same input features and structure, the latter had a non-learnable, pre-processing layer between the input and the hidden LSTM layer, comprising two filters that approximated the physiological decay curves of insulin and carbohydrate (CHO), measured in grams/minute.
Background
In T1D, glucose homeostasis gets altered; thus, patients typically self-administer insulin and follow restricted diet and exercise routines. Keeping blood glucose (BG) levels in the requisite range of 70–180 mg/dl reduces mortality risk and hyperglycemia-related other complications.
Technological advancements have made it easier to monitor glucose levels using continuous glucose monitoring (CGM) sensors. They provide one BG measurement every five minutes, and when BG supersedes the thresholds, it generates visual and acoustic alerts. Thus, patients can take timely corrective actions (e.g., a CIB) to improve their glycemic levels.
While technology enables the aversion of adverse events, real-time BG level prediction also requires advanced DSS and artificial pancreas systems (APS). The former aids patients in the clinical decision-making process, based on which the latter enables automated insulin delivery.
Machine learning models with DSS have become popular tools for T1D management. They help forecast BG levels and provide preventive therapeutic suggestions, like CIB.
Since patient safety is paramount, the models used in clinical practice must be physiologically sound, have high prediction accuracy, and fetch interpretable output.
Even though machine-learning models guarantee accurate performance, scientists have raised concerns regarding the interpretability of their outcomes. Moreover, there is no transparency in their inherent logic.
In addition, there are hidden biases in the available T1D datasets. Consequently, currently used black-box models could sometimes misinterpret the effect of these inputs on BG levels.
Such a scenario could be potentially dangerous when models are actively used to suggest therapeutic actions in clinical practice. This highlights the need for tools to interpret model outcomes, e.g., SHapley Additive exPlanation (SHAP).
They comprehend each prediction of an algorithm individually and how much each input contributes to the models’ output.
About the study
In the present study, researchers selected a female patient who carefully reported her meals and CIB data for 10 weeks, i.e., throughout the study monitoring period.
Her CGM data missed only 3% of measurements, facilitating a fair assessment of the predictive algorithms and DSSs performance. She had an elevated time-above-range (TAR) and time-in-range (TIR) (~46% and 54%) on the whole test dataset.
The team used the last 10 days of her data to compute the prediction accuracy of the models and the remaining six weeks of data to train the two LSTMs.
In addition, they used a subset of the test set (an eight-hour-long post-prandial window) to evaluate the insulin corrective actions suggested by the DSS.
The in-house developed ReplayBG is a novel in-silico methodology that helped the researchers retrospectively evaluate the effectiveness of the corrective actions suggested by the DSS of LSTM models.
Results
The SHAP summary plot outlined the value of each feature in every sample of the study dataset.
Each row in the summary plot represents a feature, with CGM and insulin being the top two features of importance. The impact of insulin on the model’s output appeared weak, as indicated by the small magnitude of the SHAP values associated with this feature.
Some values of CHO positively affected BG predictions, and others had a negative impact. It was surprising as CHO intake is known to increase BG levels in patients with T1D. These findings suggested that the model mainly relied on past CGM readings to predict future BG levels.
The observed SHAP values showed that the collinearity between insulin and CHO in the test dataset made it difficult for the learning algorithm to discriminate the individual effect on the output.
Insulin positively contributed to the model output in np-LSTM for PH=30 and 60 mins, implying that np-LSTM would forecast a glucose spike after any insulin bolus, even when the patient did not consume CHO.
Conclusions
To conclude, SHAP elucidated black-box models’ output and showed that only p-LSTM learned the physiological relationship between inputs and glucose prediction.
Only p-LSTM could improve patients’ glycemic control when embedded in the DSS. Thus, the p-LSTM is the most suitable model for any decision-making application.