AN EXPERIMENTAL STUDY OF SUPERVISED MACHINE LEARNING TECHNIQUES  FOR MINOR CLASS PREDICTION UTILIZING KERNEL DENSITY ESTIMATION:  FACTORS IMPACTING MODEL PERFORMANCE

Alfarwan, Abdullah

AN EXPERIMENTAL STUDY OF SUPERVISED MACHINE LEARNING TECHNIQUES FOR MINOR CLASS PREDICTION UTILIZING KERNEL DENSITY ESTIMATION: FACTORS IMPACTING MODEL PERFORMANCE

Date

2024-06-29

Authors

Alfarwan, Abdullah

Publisher

Western Michigan University

Abstract

This dissertation examined classification outcome differences among four popular individual supervised machine learning (ISML) models (logistic regression, decision tree, support vector machine, and multilayer perceptron) when predicting minor class membership within imbalanced datasets. The study context and the theoretical population sampled focus on one aspect of the larger problem of student retention and dropout prediction in higher education (HE): identification. This study differs from current literature by implementing an experimental design approach with simulated student data that closely mirrors HE situational and student data. Specifically, this study tested the predictive ability of the four ISML classification models (CLS) under experimentally manipulated conditions. These included total sample size (TS), minor class proportion (MCP), training-to-testing sample size ratios (TTSS), and the application of bagging techniques during model training (BAG). Using this 4-between, 1-within mixed design, five different outcome measures (precision, recall/sensitivity, specificity, F1-score and AUC) were examined and analyzed individually. For each outcome measure, findings revealed multiple statistically significant interactions among classifier models and design variables. Simple effect analyses of these interactions highlighted how TS, MCP, TTSS, and BAG differentially affect different measures of classification performance such as precision, recall/sensitivity, specificity, F1-score, and AUC. For instance, the presence of interactions involving MCP underscores the importance of informed modeling of class distribution for enhancing overall model predictive capability and performance. Such insights regarding how the experimental variables can critically affect different measures of classification success advances our understanding of how these four ISML models might be optimized for the prediction of student-at-risk status within imbalanced datasets. This dissertation provides a framework for using these or similar ISML models more effectively in HE. It points toward the development of predictive modeling methods that are more useful and perhaps equitable by demonstrating empirically the impact of one of the most challenging aspects of implementing machine learning in HE: maximizing the accurate identification of the minority class. This work contributes to the use of machine learning in HE and will help inform its use in smaller and larger educational research communities by providing strategies for improving the prediction of student dropout.

Keywords

Supervised machine learning model, Imbalanced datasets, Dropout prediction, Higher education, Predictive modeling, Classification performance metrics, Sample size effect, Class proportion, Minority class identification.

URI

https://hdl.handle.net/20.500.14154/72446

Collections

SACM - United States of America

Full item page

AN EXPERIMENTAL STUDY OF SUPERVISED MACHINE LEARNING TECHNIQUES FOR MINOR CLASS PREDICTION UTILIZING KERNEL DENSITY ESTIMATION: FACTORS IMPACTING MODEL PERFORMANCE

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By