Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 5 of 5
  • Thumbnail Image
    ItemRestricted
    Towards Cost-Effective Noise-Resilient Machine Learning Solutions
    (University of Georgia, 2026-06-04) Gharawi, Abdulrahman Ahmed; Ramaswamy, Lakshmish
    Machine learning models have demonstrated exceptional performance in various applications as a result of the emergence of large labeled datasets. Although there are many available datasets, acquiring high-quality labeled datasets is challenging since it involves huge human supervision or expert annotation, which are extremely labor-intensive and time-consuming. The problem is magnified by the considerable amount of label noise present in datasets from real-world scenarios, which significantly undermines the performance accuracy of machine learning models. Since noisy datasets can affect the performance of machine learning models, acquiring high-quality datasets without label noise becomes a critical problem. However, it is challenging to significantly decrease label noise in real-world datasets without hiring expensive expert annotators. Based on extensive testing and research, this dissertation examines the impact of different levels of label noise on the accuracy of machine learning models. It also investigates ways to cut labeling expenses without sacrificing required accuracy. Finally, to enhance the robustness of machine learning models and mitigate the pervasive issue of label noise, we present a novel, cost-effective approach called Self Enhanced Supervised Training (SEST).
    21 0
  • Thumbnail Image
    ItemRestricted
    ENSEMBLE MACHINE LEARNING IN SPACE WEATHER ANALYTICS
    (New Jersey Institute of Technology, 2024) Alobaid, Khalid; Wang, Jason
    This dissertation addresses several important space weather problems using ensemble learning techniques. An ensemble method combining multiple machine learning models is often more accurate than the individual machine learning models that form the ensemble method. There are several techniques for constructing an ensemble. With in-depth case studies, the dissertation demonstrates the usefulness and effectiveness of ensemble machine learning for space weather analytics, especially for predicting extreme space weather events such as coronal mass ejections (CMEs). The dissertation begins with an ensemble method for predicting the arrival time of CMEs from the Sun to Earth. The proposed method, named CMETNet, combines classical machine learning algorithms such as support vector regression, random forests, XGBoost and Gaussian process regression, along with a deep convolutional neural network (CNN), to perform multimodal learning. The classical machine learning algorithms are used to learn latent patterns from CME features and background solar wind parameters while the deep CNN is used to learn patterns hidden in CME images where the learned patterns are jointly used to make predictions. Experimental results show that CMETNet outperforms existing models, both machine learning based and physics based. Finally, the dissertation presents a fusion method, named DeepCME, to estimate two important properties of CMEs, namely, CME mass and kinetic energy. The DeepCME method is a fusion of three deep-learning models, namely ResNet, InceptionNet, and InceptionResNet. The fusion model extracts features from Large Angle and Spectrometric Coronagraph (LASCO) C2 images, effectively combining the learning capabilities of the three component models to jointly estimate the mass and kinetic energy of CMEs. To the best of current knowledge, this is the first time that deep learning has been used for CME mass and kinetic energy estimations. DeepCME can help scientists better understand CME dynamics. In conclusion, the dissertation showcases many applications of learning techniques including ensemble learning, deep learning, transfer learning and multimodal learning in space weather analytics. The tools and methods developed from the dissertation will make contributions to the understanding and forecasting of CME dynamics and CME geoeffectiveness.
    16 0
  • Thumbnail Image
    ItemRestricted
    Flight Crew’s Cognitive States Detection Using Psychophysiological Measurements and Machine Learning Techniques
    (Cranfield University, 2024-02-29) Alreshidi, Ibrahim; Moulitsas, Irene; Jenkins, Karl W.
    In the ever-evolving landscape of aviation safety, the accurate assessment of pilots' mental states is of paramount significance. This thesis elucidates the critical role of Electroencephalogram (EEG) data in comprehending pilots' cognitive conditions. The dataset, sourced from attention-related human performance limiting states, was publicly available on the NASA open portal website and encompasses EEG, electrocardiogram, galvanic skin response, and respiration data. The initial analyses delved into the challenges posed by noise within EEG recordings. After rigorous testing, it was observed that prevalent preprocessing techniques, specifically band-pass filtering coupled with Independent Component Analysis, were not always effective. This inefficiency underscored the need for more advanced methodologies to optimize machine learning outcomes. In response, subsequent research stages proposed a hybrid ensemble learning approach. This innovative approach integrated advanced automated EEG preprocessing with Riemannian geometry. Through rigorous experimentation and validation, it was determined that this methodology accentuated the profound advantages of refined preprocessing, significantly enhancing the accuracy and reliability of EEG data interpretation. As the inquiry advanced, a more integrative approach was adopted, amalgamating EEG with other physiological data. A novel methodology, synergizing one-dimensional Convolutional Neural Networks with Long Short-Term Memory architectures, was unveiled. Additionally, the impact of employing methods to handle data imbalance on machine learning performance was thoroughly examined. In the concluding phases, the research placed a heightened emphasis on model interpretability. Through the integration of SHapley Additive exPlanations values, a bridge was constructed between intricate model predictions and nuanced human comprehension, delineating paramount features for distinct cognitive states. To encapsulate, this thesis offers a meticulous dissection of EEG data manipulation, machine learning, and deep learning constructs, positing a blueprint for the augmentation of aviation safety through in-depth cognitive state evaluations.
    70 0
  • Thumbnail Image
    ItemRestricted
    Ensemble Learning
    (2022-09-26) Alasmari, Manal Jaber; Balinsky, Alexander
    In this dissertation, we take a deep look at ensemble learning, an advanced machine learning technique developed in the last few decades which has led to remarkable advancements in supervised learning problems. We further explore the ensemble technique known as gradient boosting in thorough detail, and look at its performance on real datasets using one of its most popular software implementations, the XGBoost Python library. The first part of the project is a thorough literature review, while the second part involves application of the theoretical background to several actual supervised learning problems. First, we overview the theoretical background around the supervised learning problem, motivating ensemble learning out of a concern around the bias-variance trade-off. We then take closer look at two dominant paradigms in ensemble learning, parallel and sequential ensembling before exploring the sequential paradigm in depth through a thorough consideration of gradient boosting. We then spend some time over-viewing the history and evolution of ensemble learning. Our primary focus is on the first and for a time the most successful gradient boosting algorithm known as AdaBoost, examining the algorithm in detail. Then we look at one of the most successful software implementations of gradient boosting, XGBoost, which achieves impressive success on large datasets due to some novel algorithmic and engineering innovations. Next, we take a short look at another dominant machine learning paradigm known as deep learning, and compare XGBoost library to deep learning on two well known supervised learning tasks from the machine learning competition website Kaggle, the Adult Income and Fashion MNIST datasets. The Adult income dataset is used in a binary classification task for mixed datatypes, while the Fashion MNIST dataset is a multiclass image classification task. In particular, we present evidence that gradient boosting may perform better on structured tabular datasets, while deep learning may perform better on unstructured data. Finally, we look at more advanced ensembling techniques, in particular the use of combination methods. We close by considering the current state-of-the-art in ensemble learning.
    23 0
  • Thumbnail Image
    ItemRestricted
    Deep Discourse Analysis for Early Prediction of Multi-Type Dementia
    (Saudi Digital Library, 2023-06-12) Alkenani, Ahmed Hassan A; Li, Yuefeng
    Ageing populations are a worldwide phenomenon. Although it is not an inevitable consequence of biological ageing, dementia is strongly associated with increasing age, and is therefore anticipated to pose enormous future challenges to public health systems and aged care providers. While dementia affects its patients first and foremost, it also has negative associations with caregivers’ mental and physical health. Dementia is characterized by irreversible gradual impairment of nerve cells that control cognitive, behavioural, and language processes, causing speech and language deterioration, even in preclinical stages. Early prediction can significantly alleviate dementia symptoms and could even curtail the cognitive decline in some cases. However, the diagnostic procedure is currently challenging as it is usually initiated with clinical-based traditional screening tests. Typically, such tests are manually interpreted and therefore may entail further tests and physical examinations thus considered timely, expensive, and invasive. Therefore, many researchers have adopted speech and language analysis to facilitate and automate its initial prescreening. Although recent studies have proposed promising methods and models, there is still room for improvement, without which automated pre-screening remains impracticable. There is currently limited empirical literature on the modelling of the discourse ability of people with prodromal dementia stages and types, which is defined as spoken and written conversations and communications. Specifically, few researchers have investigated the nature of lexical and syntactic structures in spontaneous discourse generated by patients with dementia under different conditions for automated diagnostic modelling. In addition, most previous work has focused on modelling and improving the diagnosis of Alzheimer’s disease (AD), as the most common dementia pathology, and neglect other types of dementia. Further, current proposed models suffer from poor performance, a lack of generalizability, and low interpretability. Therefore, this research thesis explores lexical and syntactic presentations in written and spoken narratives of people with different dementia syndromes to develop high-performing diagnostic models using fusions of different lexical and syntactic (i.e., lexicosyntactic) features as well as language models. In this thesis, multiple novel diagnostic frameworks are proposed and developed based on the “wisdom of crowds” theory, in which different mathematical and statistical methods are investigated and properly integrated to establish ensemble approaches for an optimized overall performance and better inferences of the diagnostic models. Firstly, syntactic- and lexical-level components are explored and extracted from the only two disparate data sources available for this study: spoken and written narratives retrieved from the well-known DementiaBank dataset, and a blog-based corpus collected as a part of this research, respectively. Due to their dispersity, each data source was independently analysed and processed for exploratory data analysis and feature extraction. One of the most common problems in this context is how to ensure a proper feature space is generated for machine learning modelling. We solve this problem by proposing multiple innovative ensemble-based feature selection pipelines to reveal optimal lexicosyntactics. Secondly, we explore language vocabulary spaces (i.e., n-grams) given their proven ability to enhance the modelling performance, with an overall aim of establishing two-level feature fusions that combine optimal lexicosyntactics and vocabulary spaces. These fusions are then used with single and ensemble learning algorithms for individual diagnostic modelling of the dementia syndromes in question, including AD, Mild Cognitive Impairment (MCI), Possible AD (PoAD), Frontotemporal Dementia (FTD), Lewy Body Dementia (LBD), and Mixed Dementia (PwD). A comprehensive empirical study and series of experiments were conducted for each of the proposed approaches using these two real-world datasets to verify our frameworks. Evaluation was carried out using multiple classification metrics, returning results that not only show the effectiveness of the proposed frameworks but also outperform current “state-of-the-art” baselines. In summary, this research provides a substantial contribution to the underlying task of effective dementia classification needed for the development of automated initial pre-screenings of multiple dementia syndromes through language analysis. The lexicosyntactics presented and discussed across dementia syndromes may highly contribute to our understanding of language processing in these pathologies. Given the current scarcity of related datasets, it is also hoped that the collected writing-based blog corpus will facilitate future analytical and diagnostic studies. Furthermore, since this study deals with associated problems that have been commonly faced in this research area and that are frequently discussed in the academic literature, its outcomes could potentially assist in the development of better classification models, not only for dementia but also for other linguistic pathologies.
    18 0

Copyright owned by the Saudi Digital Library (SDL) © 2024