A recent Scientific Reports study evaluated the performance of different machine learning models in detecting hepatitis among people with diabetes.
Study: Machine learning for predicting hepatitis B or C virus infection in diabetic patients. Image Credit: LALAKA/Shutterstock.com
Diabetes mellitus (DM) has been deemed to be one of the most globally prevalent chronic metabolic diseases in humans. This disease is categorized into two types, namely, type 1 (T1DM) and type 2 diabetes mellitus (T2DM).
T1DM is caused by β-cell loss in the pancreas, leading to a shortage of endogenous insulin. However, the manifestation of T2DM has been linked to multifactorial mechanisms that cause insulin resistance, impaired insulin secretion, and overproduction of glucose by the liver.
A combination of environmental and genetic factors can cause a steady decrease in β-cell mass and/or function, which could subsequently manifest hyperglycemia in T1DM and T2DM. People with any form of diabetes are susceptible to developing multi-organ complications over time.
Recently, many studies have reported a higher prevalence of hepatitis B virus (HBV) and hepatitis C virus (HCV) infections in the DM population.
Compared to people without DM, individuals with DM are at 60% higher risk of contracting HBV infection. Similarly, the prevalence of HCV is also higher in the diabetic group compared to the non-diabetic group.
Since some diabetic individuals with HBV or HCV infections remain asymptomatic, it is challenging to identify them. There is a need for selective screening methods to identify or predict the risk of contracting hepatitis in people with DM.
Previous studies have reported contradictory results regarding the factors that lead to the development of hepatitis in people with diabetes.
Machine learning has emerged as a potential tool in the healthcare sector as it can extract useful information from imbalanced clinical datasets. Machine learning models can be applied to identify key predictors of hepatitis development in diabetes. This will help clinicians to formulate optimal preventive or treatment strategies.
Previous studies have shown that machine learning models were able to predict individuals who were at high risk for hepatitis accurately.
Machine learning models, such as random forest (RF) and K-nearest neighbour, yielded an overall accuracy of 96% in predicting HCV; while eXtreme Gradient Boosting (XGBoost) could predict HBV with 92% accuracy.
Integration of various machine learning algorithms, an ensemble technique, yielded better accuracy than a single machine learning model.
About the study
This study focussed on determining the most favorable machine learning models that can accurately detect hepatitis in people with DM.
The body measurements, demographics, lipid profiles, and questionnaire data were used to determine the relationship between diabetes and twelve risk factors for hepatitis.
Pre-processed datasets from the National Health and Nutrition Examination Survey (NHANES), between 2013 and 2018, were used in this study.
This study evaluated four machine-learning models, namely, RF, SVM, XGBoost, and least absolute shrinkage and selection operator (LASSO), to determine the risk of hepatitis among diabetics.
Based on the inclusion criteria, a total of 1,396 diabetic patients were recruited in this study. The mean age of the participants was 54 years. The study cohort included sixty-four individuals with HBV or HCV and the rest without the disease.
It must be noted that the hepatitis group comprised a higher percentage of Asian and non-Hispanic White individuals, while the non-hepatitis group contained a higher number of Mexican and other Hispanic individuals. The majority of the individuals in the hepatitis group were male.
Due to the imbalanced ratio between non-hepatitis and hepatitis patients, the synthetic minority oversampling technique (SMOTE) balancing technique was used. After data normalization, the machine learning model was trained, and their performance was analyzed.
Although all the machine learning models assessed in this study demonstrated improved performance after the hyperparameter tuning process, the highest predictive capacity for the development of HBV or HCV infection in people with diabetes was demonstrated by LASSO.
Hyperparameter optimization enabled the selection of the most suitable parameters that helped to improve the performance of machine learning models.
In line with the findings of this study, a previous study also demonstrated the superior performance of LASSO in predicting hepatocellular carcinoma in patients with chronic HBV infection.
These observations shed light on the application of LASSO in clinical decision-making. After combining high-performing models, the ensemble results indicated that stacking always did not improve performance metrics for the predictions.
Poverty, use of Illegal drugs, and race are found to be the major predictors of hepatitis in people with diabetes. Consistent with current study findings, a higher prevalence of hepatitis was observed in people with diabetes compared to the non-diabetic group.
The current study findings indicated that machine learning models, particularly LASSO, could be used to identify the contributing factors responsible for hepatitis infection among people with diabetes.
This approach could be exploited for early detection of hepatitis in people with DM and thus aids in clinical decision-making. This study provided important insight for developing a screening strategy to identify diabetic people at a higher risk of hepatitis.