Reducing Type 1 Childhood Diabetes in Saudi Arabia by Identifying and Modelling Its Key Performance Indicators

No Thumbnail Available

Date

2024-06

Journal Title

Journal ISSN

Volume Title

Publisher

Royal Melbourne Institute of Technology

Abstract

The increasing incidence of type 1 diabetes (T1D) in children is a growing global health concern. Reducing the incidence of diabetes generally is one of the goals in the World Health Organisation’s (WHO) 2030 Agenda for Sustainable Development Goals. With an incidence rate of 31.4 cases per 100,000 children and an estimated 3,800 new cases per year, Saudi Arabia is ranked 8th in the world for number of T1D cases and 5th for incidence rate. Despite the remarkable increase in the incidence of childhood T1D in Saudi Arabia, there is a lack of meticulously carried out research on T1D in children when compared with developed countries. In addition, it is crucial to recognise the critical gaps in current understanding of diabetes in children, adolescents, and young adults, with recent research indicates significant global and sub-national variations in disease incidence. Better knowledge of the development of T1D in children and its associated factors would aid medical practitioners in developing intervention plans to prevent complications and address the incidence of T1D. This study employed statistical, machine learning and classification approaches to analyse and model different aspects of childhood T1D using local case and control data. In this study, secondary data from 1,142 individual medical records (359-377 cases and 765 controls) collected from three cities located in different regions of Saudi Arabia have been used in the analysis to represent the country’s diverse population. Case and control data matched by birth year, gender and location were used to control confounders and create a more robust and clinically relevant model. It is well documented that genetic and environmental factors contribute to childhood T1D so a wide range of potential key performance indicators (KPIs) from the literature were included in this study. The collected data included information on socioeconomic status, potential genetic and environmental factors, and demographic data such as city of residence, gender and birth year. Several techniques, such as cross-validation, hyperparameter tuning and bootstrapping, were used in this study to develop models. Common statistical metrics (coefficient of determination, R-squared, root mean squared error, mean absolute error) were used to evaluate performance for the regression models while for the classification models accuracy, sensitivity, precision, F score and area under the curve were utilised as performance measures. Multiple linear regression (MLR), artificial neural network (ANN) and random forest (RF) models were developed to predict the age at onset of T1D for all children 0-14 years old, as well as for the most common age group for onset, the 5-9 year olds. To improve the performance of the MLR models, interactions between variables were considered. Additionally, risk factors associated with the age at onset of T1D were identified. The results showed that MLR and RF outperformed ANN. The logarithm of age at onset was the most suitable dependent variable. RF outperformed the others for the 5-9 years age group. Birth weight, current weight and current height influenced the age at onset in both age groups. However, preterm birth was significant only in the 0-14 years cohort, while consanguineous parents and gender were significant in the 5-9 age group. Logistic regression (LR), random forest (RF), support vector machine (SVM), Naive Bayes (NB) and artificial neural network (ANN) models were utilised with case and control data to model the development of childhood T1D and to identify its key performance indicators. Full and reduced models were developed to determine the best model. The reduced models were built using the significant factors identified by the individual full model. The study found that full LR had the highest accuracy. Full RF and SVM with a linear kernel also performed well. Significant risk factors identified as being associated with developing childhood T1D include early exposure to cow’s milk, high birth weight, positive family history of T1D and maternal age over 25 years. Poisson regression (PR), RF, SVM and K-nearest neighbor (KNN) were then used to model the incidence of childhood T1D, taking in the identified significant risk factors. The interactions between variables were also considered to enhance the performance of the models. Both full and reduced models were created and compared to find the best models with the minimum number of variables. The full Poisson regression and machine learning models outperformed all other models, but reduced models with a combination of only two out of three independent variables (early exposure to cow’s milk, high birth weight and maternal age over 25 years) also performed relatively well. This study also deployed optimisation procedures with the reduced incidence models to develop upper and lower yearly profile limits for childhood T1D incidence to achieve the United Nations (UN) and Saudi recommended levels of 264 and 339 cases by 2030. The profile limits for childhood T1D then allowed us to model optimal yearly values for the number of children weighing more than 3.5kg at birth, the number of deliveries by older mothers and the number of children introduced early to cow’s milk. The results presented in this thesis will guide healthcare providers to collect data to monitor the most influential KPIs. This would enable the initiation of suitable intervention strategies to reduce the disease burden and potentially slow the incidence rate of childhood T1D in Saudi Arabia. The research outcomes lead to recommendations to establish early intervention strategies, such as educational campaigns and healthy lifestyle programs for mothers along with child health mentoring during and after pregnancy to reduce the incidence of childhood T1D. This thesis has contributed to new knowledge on childhood T1D in Saudi Arabia by: * developing a predictive model for age at onset of childhood T1D using statistical and machine learning models. * predicting the development of T1D in children using matched case-control data and identifying its KPIs using statistical and machine learning approaches. * modeling the incidence of childhood T1D using its associated significant KPIs. * developing three optimal profile limits for monitoring the yearly incidence of childhood T1D and its associated significant KPIs. * providing a list of recommendations to establish early intervention strategies to reduce the incidence of childhood T1D.

Description

Keywords

Type 1 diabetes, Saudi Arabia, Children, Statistical, Machine learning, Regression, Classification, Modelling, KPIs, Age at onset, Case-Control, Cross-validation, Optimisation, Profile limits, Monitoring, Predicting

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2024