Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
4 results
Search Results
Item Restricted Risk Factor Analysis and Prediction of Chronic Kidney Disease Using Clinical Data from Indian Patients(Saudi Digital Library, 2025) Alkhunaizan, Sarah; Claudio, FronterreChronic kidney disease (CKD) is a progressive condition that is frequently underdiagnosed as it is asymptomatic in early stages, creating a need for reliable prediction tools to support earlier identification and intervention. This study aimed to (1) identify key clinical and demographic factors associated with CKD and (2) develop and compare predictive models by applying routinely collected health data. Analysis was conducted using a real-world clinical dataset from Apollo Hospitals in Tamil Nadu, India, made publicly available via the UCI Machine Learning Repository (n = 397; 25 variables; binary outcome: CKD vs non-CKD). To reduce data leakage and focus on disease prediction, direct diagnostic biomarkers (serum creatinine, blood urea, and urine albumin) were excluded. Missingness (10.5%) was assessed, Little’s MCAR test rejected MCAR, and regression findings were consistent with a MAR mechanism; four strategies were compared (complete-case analysis, deterministic, stochastic, and random forest imputation), with random forest imputation selected for subsequent analyses. Exploratory analyses described distributions and associations, and correlated predictors were removed to mitigate multicollinearity. Three models, LASSO logistic regression, decision tree (CART), and XGBoost, were trained using a 70/30 train-test split with 10-fold cross-validation and evaluated using accuracy, sensitivity, specificity, ROC-AUC, and calibration. XGBoost achieved the best discrimination (accuracy 96.6%, AUC 0.991), while the decision tree demonstrated the strongest calibration. Across models, the most influential predictors consistently included red blood cell count, hypertension, diabetes mellitus, sodium, abnormal urinary red blood cells, and appetite. These findings support the utility of machine learning models, particularly XGBoost, for early CKD risk prediction using routine clinical data, while highlighting the importance of robust preprocessing and validation to improve clinical applicability.7 0Item Restricted Machine Learning Techniques for Financial Loan Default Prediction in UK: A Comparative Analysis of Decision Tree and Random Forest Models(Saudi Digital Library, 2025) Alrakan, Fahad Abdulaziz; Alwzinani, FarisThis dissertation proposes a comprehensive approach to variable selection and model comparison applied to credit scoring, based on a Lending Club 2016–2018 dataset. The methodology combines an initial manual selection, based on completeness and business logic, followed by an automatic selection via RFECV (Recursive Feature Elimination with Cross-Validation) using a Random Forest. Finally, an importance permutation analysis and an ablation experiment (Top 10 variables) complete the evaluation. The results show that all 21 variables selected are considered relevant by RFECV, but that most of the predictive power is concentrated in a subset of about 15 variables. A comparison of the models highlights the clear superiority of Random Forest (AUC ≈ 0.713; PR-AUC ≈ 0.437) over Decision Tree (AUC ≈ 0.594; PR-AUC ≈ 0.319). Permutation importance analysis confirms business intuition: interest rate, credit sub- grade, and residential status appear to be the main explanatory factors, supplemented by financial indicators (debt ratio, loan amount, FICO score). The ablation experiment shows that these ten main variables are sufficient to preserve almost all of the Random Forest's performance (AUC = 0.708), while reducing training time by approximately 40%. These results highlight two major points: (i) Random Forest is robust and capable of effectively exploiting a small core of variables, but its performance remains below the standards expected for an industrial model (>0.80 AUC); (ii) the hierarchy of variables reveals both the relevance of expected indicators and the redundancy between certain correlated measures. The limitations identified concern sensitivity to correlations, the temporal restriction of the sample (2016–2018), and the computational cost of certain steps (RFECV). In conclusion, this project validates the feasibility of a robust and parsimonious model based on Random Forest, while opening up prospects for improvement: use of boosting algorithms, calibration of thresholds according to economic issues, temporal robustness tests, and pipeline optimization.6 0Item Restricted A CLOUD-BASED AI SYSTEM FOR SKILL GAP ANALYSIS AND TRAINING PATH RECOMMENDATION IN HR DEPARTMENTS(Saudi Digital Library, 2025) Alanazi, Abdullah Ramadan; AlYamani, AbdulghaniThis dissertation presents the development of a cloud-based artificial intelligence (AI) system designed to automate skill gap analysis and provide personalised training recommendations in Human Resource (HR) departments. The system integrates employee profiles, job role requirements, and training histories to identify competency gaps using a decision tree algorithm. The AI model achieved an accuracy of 0.86 and demonstrated strong interpretability and efficiency in recommending relevant training paths. Usability testing with HR professionals confirmed the system’s practicality and reliability in supporting workforce development and data-driven training strategies. The research contributes to the field of HR analytics by combining Human Capital Theory with Knowledge Discovery in Databases (KDD) to provide an explainable, scalable, and cloud-enabled HR decision-support framework.11 0Item Restricted Intelligent Diabetes Screening with Advanced Analytics(University of Birmingham, 2024) Aldossary, Soha; Smith, PhillipDiabetes mellitus is a prevalent chronic disease with significant health implications worldwide. This project aimed to mitigate this pressing public health concern by using machine learning techniques and deep learning algorithms. I also established an online platform at which patients can enter their test results and health information and receive real-time diabetes detection and dietary recommendations based on their health profiles. Research has illustrated that models such as Gradient Boosting, Random Forest and Decision Trees perform well in diabetes prediction due to their ability to capture complex nonlinear relationships and handle diverse input features. Therefore, this project incorporated these models with others, such as the Support Vector Classifier and AdaBoost. Additionally, deep learning models, including Neural Networks, were utilised to explore intricate relationships within diabetes-related indicators. Notably, the Gradient Boosting model achieved an impressive accuracy of 99%, with 99% precision, 97% recall and 97% F1-score. To implement these solutions, I used Python as the programming language, employing libraries such as scikit-learn, NumPy, Pandas and Matplotlib, while Streamlit served as the app’s framework.20 0
