In this dissertation, we take a deep look at ensemble learning, an advanced machine learning technique developed in the last few decades which has led to remarkable advancements in supervised learning problems. We further explore the ensemble technique known as gradient boosting in thorough detail, and look at its performance on real datasets using one of its most popular software implementations, the XGBoost Python library. The first part of the project is a thorough literature review, while the second part involves application of the theoretical background to several actual supervised learning problems. First, we overview the theoretical background around the supervised learning problem, motivating ensemble learning out of a concern around the bias-variance trade-off. We then take closer look at two dominant paradigms in ensemble learning, parallel and sequential ensembling before exploring the sequential paradigm in depth through a thorough consideration of gradient boosting. We then spend some time over-viewing the history and evolution of ensemble learning. Our primary focus is on the first and for a time the most successful gradient boosting algorithm known as AdaBoost, examining the algorithm in detail. Then we look at one of the most successful software implementations of gradient boosting, XGBoost, which achieves impressive success on large datasets due to some novel algorithmic and engineering innovations. Next, we take a short look at another dominant machine learning paradigm known as deep learning, and compare XGBoost library to deep learning on two well known supervised learning tasks from the machine learning competition website Kaggle, the Adult Income and Fashion MNIST datasets. The Adult income dataset is used in a binary classification task for mixed datatypes, while the Fashion MNIST dataset is a multiclass image classification task. In particular, we present evidence that gradient boosting may perform better on structured tabular datasets, while deep learning may perform better on unstructured data. Finally, we look at more advanced ensembling techniques, in particular the use of combination methods. We close by considering the current state-of-the-art in ensemble learning.
Ensemble Learning, Machine Learning, Data Science and Analysis