Heterogeneous Machine Learning Ensembles for Predicting Train Delays

Thumbnail Image

Date

2023-06-07

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Train delays are a serious problem in the UK and other countries. Much research has gone into developing methods for predicting train delays. Most of these methods use only single models or homogeneous ensembles and their performance in terms of accuracy and consistency in general is unsatisfactory. We have therefore developed heterogeneous ensembles that use different types of regression models with an aim of improving their prediction performance. We first looked at a wide range of base-learner models, including the state-of-the-art methods, Random Forest and XGBoost. Overall, our ensembles were more accurate than any of these single models. We developed two methods for model selection when building the ensemble, the first uses accuracy and the second uses accuracy and diversity. We found that using accuracy resulted in the most accurate ensembles. We adapted the Coincident Failure Diversity measure for regression and compared its effectiveness with other diversity measures. While it proved the best, overall, we found no relationship between ensemble accuracy and diversity in the regression context. We also investigated the effect of ensemble size. We compared the performance of our ensembles with the deep learning methods CNN and Tabnet and found that our ensembles were more accurate. However, ensembles of deep learning models proved to be more accurate than those of single machine learning models. We tested our ensembles using a different set of train delay data and found that they produced more accurate and consistent results, indicating that our methods generalise well to new data.

Description

Keywords

Ensemble, Train delay, Heterogeneous ensemble, Diversity, Random Forest

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025