Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 1 of 1
  • ItemRestricted
    Risk Factor Analysis and Prediction of Chronic Kidney Disease Using Clinical Data from Indian Patients
    (Saudi Digital Library, 2025) Alkhunaizan, Sarah; Claudio, Fronterre
    Chronic kidney disease (CKD) is a progressive condition that is frequently underdiagnosed as it is asymptomatic in early stages, creating a need for reliable prediction tools to support earlier identification and intervention. This study aimed to (1) identify key clinical and demographic factors associated with CKD and (2) develop and compare predictive models by applying routinely collected health data. Analysis was conducted using a real-world clinical dataset from Apollo Hospitals in Tamil Nadu, India, made publicly available via the UCI Machine Learning Repository (n = 397; 25 variables; binary outcome: CKD vs non-CKD). To reduce data leakage and focus on disease prediction, direct diagnostic biomarkers (serum creatinine, blood urea, and urine albumin) were excluded. Missingness (10.5%) was assessed, Little’s MCAR test rejected MCAR, and regression findings were consistent with a MAR mechanism; four strategies were compared (complete-case analysis, deterministic, stochastic, and random forest imputation), with random forest imputation selected for subsequent analyses. Exploratory analyses described distributions and associations, and correlated predictors were removed to mitigate multicollinearity. Three models, LASSO logistic regression, decision tree (CART), and XGBoost, were trained using a 70/30 train-test split with 10-fold cross-validation and evaluated using accuracy, sensitivity, specificity, ROC-AUC, and calibration. XGBoost achieved the best discrimination (accuracy 96.6%, AUC 0.991), while the decision tree demonstrated the strongest calibration. Across models, the most influential predictors consistently included red blood cell count, hypertension, diabetes mellitus, sodium, abnormal urinary red blood cells, and appetite. These findings support the utility of machine learning models, particularly XGBoost, for early CKD risk prediction using routine clinical data, while highlighting the importance of robust preprocessing and validation to improve clinical applicability.
    7 0

Copyright owned by the Saudi Digital Library (SDL) © 2026