EXPLAINABLE MACHINE LEARNING FOR EDUCATIONAL DATA

Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Educational repositories contain complex trajectories of students and university data. Being able to model this data would offer great value in being able to identify students’ trajectories, predicting their likely future performance, and identifying those who require appropriate intervention as early as possible. However, understanding the nature of the correlations and the dependencies among the educational attributes (which can be time-dependant non-linear relationships) is fundamental for the learning of robust predictive classifiers. When predicting academic performance, many machine learning algorithms make decisions based on data that can be imbalanced, badly sampled, or biased based on historical societal prejudices. In this thesis, I explore, implement, and evaluate temporal predictive classifiers that aim to overcome some of these issues. The approach combines time-series clustering in conjunction with probabilistic learning, resampling, feature subspace learning, and specialist deep learning methods to learn models that are simultaneously accurate and unbiased. A key technical objective in learning these classifiers is to incorporate different types of temporal performance data collected at different times (student admission to a higher education institution, and at Year 1 and 2 of a student’s studies), for the explicit modelling of cognitive styles. A resampling method is applied with bootstrap aggregating to address the issue of the imbalanced time-series educational datasets, which is related to miss-classifying the minority-class of the high-risk or failing students. The evaluation of an unsupervised subspace learning approach using an Autoassociative Neural Network (Autoencoder) is also made, to reconstruct the educational data by maximising variance for improved performance prediction. In addition, the issues of modelling bias are explored such that the types are identified and whether they are accounting for inflated predictive accuracies is established. A graphical learning approach with a BN, that is transparent in how they make decisions, is compared with three forms of Deep Multi-label Convolutional Neural Network (CNNs) to investigate whether deep learning classifiers can be learned that maximise accuracy and minimise bias. The evaluation of the experimental results reveals that identifying cognitive styles improves both explanation and accuracy; that rebalancing also improves accuracy, and that a combii nation of probabilistic modelling and deep 1D Multi-label CNN can successfully identify and eliminate many biases when predicting student’s academic performance

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025