Ensemble Learning

Alasmari, Manal Jaber

Ensemble Learning

dc.contributor.advisor	Balinsky, Alexander
dc.contributor.author	Alasmari, Manal Jaber
dc.date.accessioned	2023-07-17T12:51:38Z
dc.date.available	2023-07-17T12:51:38Z
dc.date.issued	2022-09-26
dc.description.abstract	In this dissertation, we take a deep look at ensemble learning, an advanced machine learning technique developed in the last few decades which has led to remarkable advancements in supervised learning problems. We further explore the ensemble technique known as gradient boosting in thorough detail, and look at its performance on real datasets using one of its most popular software implementations, the XGBoost Python library. The first part of the project is a thorough literature review, while the second part involves application of the theoretical background to several actual supervised learning problems. First, we overview the theoretical background around the supervised learning problem, motivating ensemble learning out of a concern around the bias-variance trade-off. We then take closer look at two dominant paradigms in ensemble learning, parallel and sequential ensembling before exploring the sequential paradigm in depth through a thorough consideration of gradient boosting. We then spend some time over-viewing the history and evolution of ensemble learning. Our primary focus is on the first and for a time the most successful gradient boosting algorithm known as AdaBoost, examining the algorithm in detail. Then we look at one of the most successful software implementations of gradient boosting, XGBoost, which achieves impressive success on large datasets due to some novel algorithmic and engineering innovations. Next, we take a short look at another dominant machine learning paradigm known as deep learning, and compare XGBoost library to deep learning on two well known supervised learning tasks from the machine learning competition website Kaggle, the Adult Income and Fashion MNIST datasets. The Adult income dataset is used in a binary classification task for mixed datatypes, while the Fashion MNIST dataset is a multiclass image classification task. In particular, we present evidence that gradient boosting may perform better on structured tabular datasets, while deep learning may perform better on unstructured data. Finally, we look at more advanced ensembling techniques, in particular the use of combination methods. We close by considering the current state-of-the-art in ensemble learning.
dc.format.extent	71
dc.identifier.uri	https://hdl.handle.net/20.500.14154/68640
dc.language.iso	en
dc.subject	Ensemble Learning
dc.subject	Machine Learning
dc.subject	Data Science and Analysis
dc.title	Ensemble Learning
dc.type	Thesis
sdl.degree.department	School of Mathematics
sdl.degree.discipline	Data Science and Analysis
sdl.degree.grantor	Cardiff University
sdl.degree.name	Master of Mathematics

Collections

SACM - United Kingdom

Ensemble Learning

Files

Collections