Heterogeneous Machine Learning Ensembles for Predicting Train Delays

dc.contributor.advisorWang, Wenjia
dc.contributor.authorAl Ghamdi, Mostafa
dc.date.accessioned2023-07-11T10:44:18Z
dc.date.available2023-07-11T10:44:18Z
dc.date.issued2023-06-07
dc.description.abstractTrain delays are a serious problem in the UK and other countries. Much research has gone into developing methods for predicting train delays. Most of these methods use only single models or homogeneous ensembles and their performance in terms of accuracy and consistency in general is unsatisfactory. We have therefore developed heterogeneous ensembles that use different types of regression models with an aim of improving their prediction performance. We first looked at a wide range of base-learner models, including the state-of-the-art methods, Random Forest and XGBoost. Overall, our ensembles were more accurate than any of these single models. We developed two methods for model selection when building the ensemble, the first uses accuracy and the second uses accuracy and diversity. We found that using accuracy resulted in the most accurate ensembles. We adapted the Coincident Failure Diversity measure for regression and compared its effectiveness with other diversity measures. While it proved the best, overall, we found no relationship between ensemble accuracy and diversity in the regression context. We also investigated the effect of ensemble size. We compared the performance of our ensembles with the deep learning methods CNN and Tabnet and found that our ensembles were more accurate. However, ensembles of deep learning models proved to be more accurate than those of single machine learning models. We tested our ensembles using a different set of train delay data and found that they produced more accurate and consistent results, indicating that our methods generalise well to new data.
dc.format.extent200
dc.identifier.urihttps://hdl.handle.net/20.500.14154/68571
dc.language.isoen
dc.subjectEnsemble
dc.subjectTrain delay
dc.subjectHeterogeneous ensemble
dc.subjectDiversity
dc.subjectRandom Forest
dc.titleHeterogeneous Machine Learning Ensembles for Predicting Train Delays
dc.typeThesis
sdl.degree.departmentSchool of Computing Sciences
sdl.degree.disciplineArtificial Intelligence
sdl.degree.grantorUniversity of East Anglia
sdl.degree.nameDoctor of Philosophy

Files

Copyright owned by the Saudi Digital Library (SDL) © 2025