Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 6 of 6
  • ItemRestricted
    Exploring Advanced Deep Learning, foundation and Hybrid models for Medical Image Classification
    (University of Surrey, 2024-09) Kutbi, Jad; Carneiro, Gustavo
    This dissertation explores the use of advanced deep learning architectures, foundation models, and hybrid models for medical image classification. Medical imaging plays a critical role in the healthcare industry, and deep learning models have demonstrated significant potential in improving the accuracy and efficiency of diagnostic processes. This work focuses on three datasets: RetinaMNIST, BreastMNIST, and FractureMNIST3D from the MedMNISTv2 datasets, each representing different imaging modalities and classification tasks. The significance of this work lies in its comprehensive evaluation of state-of-the-art models, including ResNet, Vision Transformers (ViT), ConvNeXt, and Swin Transformers, and their effectiveness in handling complex medical images. The primary contributions of this research are the implementation and benchmarking of modern architectures on these datasets, as well as the investigation of hyperparameter optimization using Optuna. Pretrained models and hybrid architectures such as CNN-ViT, SwinConvNeXt and CNN-LSTM were explored to enhance performance. Key results demonstrate that models like ConvNeXt-tiny (pretrained) and CLIP achieved high accuracy and AUC scores, particularly in BreastMNIST and RetinaMNIST datasets, setting new performance benchmarks. The combination of Swin and ConvNeXt using feature fusion was shown to improve model robustness, especially when handling multi-class and 3D data.
    15 0
  • Thumbnail Image
    ItemRestricted
    Efficient Processing of Convolutional Neural Networks on the Edge: A Hybrid Approach Using Hardware Acceleration and Dual-Teacher Compression
    (University of Central Florida, 2024-07-05) Alhussain, Azzam; Lin, Mingjie
    This dissertation addresses the challenge of accelerating Convolutional Neural Networks (CNNs) for edge computing in computer vision applications by developing specialized hardware solutions that maintain high accuracy and perform real-time inference. Driven by open-source hardware design frameworks such as FINN and HLS4ML, this research focuses on hardware acceleration, model compression, and efficient implementation of CNN algorithms on AMD SoC-FPGAs using High-Level Synthesis (HLS) to optimize resource utilization and improve the throughput/watt of FPGA-based AI accelerators compared to traditional fixed-logic chips, such as CPUs, GPUs, and other edge accelerators. The dissertation introduces a novel CNN compression technique, "Two-Teachers Net," which utilizes PyTorch FX-graph mode to train an 8-bit quantized student model using knowledge distillation from two teacher models, improving the accuracy of the compressed model by 1%-2% compared to existing solutions for edge platforms. This method can be applied to any CNN model and dataset for image classification and seamlessly integrated into existing AI hardware and software optimization toolchains, including Vitis-AI, OpenVINO, TensorRT, and ONNX, without architectural adjustments. This provides a scalable solution for deploying high-accuracy CNNs on low-power edge devices across various applications, such as autonomous vehicles, surveillance systems, robotics, healthcare, and smart cities.
    25 0
  • Thumbnail Image
    ItemRestricted
    Utilising Technical Analysis, Commodities Data, and Market Indices to Predict Stock Price Movements with Deep Learning
    (Cardiff University, 2024) Aloraini, Osama Mohammed A; Sun, Xianfang
    This study investigates the efficacy of deep learning models, specifically Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN), for forecasting stock price movements in the U.S. stock market. The dataset used includes 133 stocks across 19 different sectors and covers the period from 2010 to 2023. Moreover, to enrich the dataset, eleven technical indicators and their corresponding trading strategies, represented as vectors, were integrated along with market indices and commodities data. Consequently, various experiments were conducted to assess the effectiveness of different feature combinations. The findings reveal that the CNN model outperforms the LSTM model in both accuracy and profitability, achieving the highest accuracy of 59.7%. Furthermore, models demonstrated an ability to identify significant trend-changing points in stock price movements. Another finding shows that translating trading strategies into vector form plays a critical role in enhancing the performance of both models. However, it was observed that incorporating external features like market indices and commodities data led to model overfitting. Conversely, relying only on stock-specific features triggered a risk of model underfitting.
    64 0
  • Thumbnail Image
    ItemRestricted
    Activation Functions In Deep Learning For Aerial Image Segmentation
    (Saudi Digital Library, 2023-11-01) Alamri, Raghad Jaza; Morley, Terence
    In remote sensing, deep learning models have been widely proposed and evaluated, especially for scene classification using Convolutional Neural Networks (CNNs) or semantic segmentation through Fully Convolutional Networks (FCN). There is still a research gap in studying the impact of activation functions on semantic segmentation performance in FCN, mainly when applied to aerial images. This dissertation attempts to bridge this gap by comprehensively examining the impact of nine activation func- tions on FCN models. This study presents intensive experiments on different FCN architectures, UNet and FPN. UNet is a simple and straightforward architecture, while FPN is very deep and complex. Also, two datasets were used: a small dataset with only five classes with images from the same country and a more diverse dataset with nine classes and images of various resolutions and complexity from all over the world. This experiment consists of two phases. The first phase involves establishing four baseline models for integrating diverse activation functions through a systematic method of hy- perparameter tuning. Afterwards, each baseline model was implemented across ten different activation function variations. In total, forty distinct models were trained and evaluated. Based on these experiments, it is evident that the choice of activation func- tions has a significant impact on the stability of the training and convergence speed. Additionally, the activation functions play a crucial role in the overall performance and within-class performance of the models. However, the behaviour of each activa- tion function is highly affected by the combination of architectures and datasets used.
    6 0
  • Thumbnail Image
    ItemRestricted
    Deep Learning-Based Digital Human Modeling And Applications
    (Saudi Digital Library, 2023-12-14) Ali, Ayman; Wang, Pu
    Recent advancements in the domain of deep learning models have engendered remarkable progress across numerous computer vision tasks. Notably, there has been a burgeoning interest in the field of recovering three-dimensional (3D) human models from monocular images in recent years. This heightened interest can be attributed to the extensive practical applications that necessitate the utilization of 3D human models, including but not limited to gaming, human-computer interaction, virtual systems, and digital twin. The focus of this dissertation is to conceptualize and develop a suite of deep learning-based models with the primary objective of enabling the expeditious and high-fidelity digitalization of human subjects. This endeavor further aims to facilitate a multitude of downstream applications that leverage digital 3D human models. The endeavor to estimate a three-dimensional (3D) human mesh from a monocular image necessitates the application of intricate deep-learning models for enhanced feature extraction, albeit at the expense of heightened computational requirements. As an alternative approach, researchers have explored the utilization of a skeleton-based modality, which represents a lightweight abstraction of human pose, aimed at mitigating the computational intensity. However, this approach entails the omission of significant visual cues, particularly shape information, which cannot be entirely derived from the 3D skeletal representation alone. To harness the advantages of both paradigms, a hybrid methodology that integrates the benefits of 3D human mesh and skeletal information offers a promising avenue. Over the past decade, substantial strides have been made in the estimation of two-dimensional (2D) joint coordinates derived from monocular images. Simultaneously, the application of Convolutional Neural Networks (CNNs) for the extraction of intricate visual features from images has demonstrated its prowess in feature extraction. This progress serves as a compelling impetus for our investigation into a hybrid architectural framework that combines CNNs with a lightweight graph transformer-based approach. This innovative architecture is designed to elevate the 2D joint pose to a comprehensive 3D representation and recover essential visual cues essential for the precise estimation of pose and shape parameters. While SOTA results in 3D Human Pose Estimation (HPE) are important, they do not guarantee the accuracy and plausibility required for biomechanical analysis. Our innovative two-stage deep learning model is designed to efficiently estimate 3D human poses and associated kinematic attributes from monocular videos, with a primary focus on mobile device deployment. The paramount significance of this contribution lies in its ability to provide not only accurate 3D pose estimations but also biomechanically plausible results. This plausibility is essential for achieving accurate biomechanical analyses, thereby advancing various applications, including motion tracking, gesture recognition, and ergonomic assessments. Our work significantly contributes to enhancing our understanding of human movement and its interaction with the environment, ultimately impacting a wide range of biomechanics-related studies and applications. In the realm of human movement analysis, one prominent downstream task is the recognition of human actions based on skeletal data, known as Skeleton-based Human Action Recognition (HAR). This domain has garnered substantial attention within the computer vision community, primarily due to its distinctive attributes, such as computational efficiency, the innate representational power of features, and robustness to variations in illumination. In this context, our research demonstrates that, by representing 3D pose sequences as RGB images, conventional Convolutional Neural Network (CNN) architectures, exemplified by ResNet-50, when complemented by as tute training strategies and diverse augmentation techniques, can attain State-of-the-Art (SOTA) accuracy levels, surpassing the widely adopted Graph Neural Network models. The domain of radar-based sensing, rooted in the transmission and reception of radio waves, offers a non-intrusive and versatile means to monitor human movements, gestures, and vital signs. However, despite its vast potential, the lack of comprehensive radar datasets has hindered the broader implementation of deep learning in radar-based human sensing. In response, the application of synthetic data in deep learning training emerges as a crucial advantage. Synthetic datasets provide an expansive and practically limitless resource, enabling models to adapt and generalize proficiently by exposing them to diverse scenarios, transcending the limitations of real-world data. As part of this research’s trajectory, a novel computational framework known as "virtual radar" is introduced, leveraging 3D pose-driven physics-informed principles. This paradigm allows for the generation of high-fidelity synthetic radar data by merging 3D human models and the principles of Physical Optics (PO) approximation for radar cross-section modeling. The introduction of virtual radar marks a groundbreaking path towards establishing foundational models focused on the nuanced understanding of human behavior through privacy-preserving radar-based methodologies.
    16 0
  • Thumbnail Image
    ItemRestricted
    Leveraging Machine Learning for Enhanced Detection and Classification of Brain Pathologies Using EEG
    (Saudi Digital Library, 2023-11-09) Albaqami, Hezam; Hassan, Ghulam Mubashar; Datta, Amitava
    Maintaining brain health is vital due to its role in controlling all body functions. This thesis introduces novel methods for the problem of automated brain diagnostic tasks using electroencephalogram (EEG). Several contributions have been made, including wavelet-based feature extraction methods and novel deep-learning architectures for detecting and classifying brain pathologies. Additionally, novel methods of feature dimensionality reduction, data fusion, and data augmentation are proposed. The proposed solutions are rigorously assessed using extensive EEG datasets consisting of patients from a wide demographic range to evaluate the generalization capabilities. This thesis offers significant contributions to biomedical signal processing for diagnostic tasks.
    46 0

Copyright owned by the Saudi Digital Library (SDL) © 2024