Features Selection Strategies for Classifying Heterogeneous Cardiovascular Disease Data

dc.contributor.advisorConene, Frans
dc.contributor.advisorZheng, Yalin
dc.contributor.authorAldosari, Hanadi
dc.date.accessioned2025-06-18T08:26:08Z
dc.date.issued2025-05
dc.description.abstractThe exponential growth of data across diverse domains has presented significant challenges, particularly in the integration and analysis of multi-modal datasets. This thesis addresses these challenges by proposing a Homogeneous Feature Vector Representation (HFVR) framework, designed to unify disparate data formats and facilitate holistic machine learning. The primary application of this research is the classification of Cardiovascular Disease (CVD), using data from multiple sources, including time series, images, video, and clinical records. The main research question guiding this thesis is: How can multiple heterogeneous data sources be effectively combined to support comprehensive and integrated machine learning analysis? The contributions of this work are twofold: (i) the development of technical methodologies for feature extraction, and (ii) the application of these methods within a medical context to enhance CVD diagnosis and prognosis. Five feature extraction techniques are presented to address the complexities of multi-modal data integration: 1-1D Motifs and Discords (1D-MD): This technique uses matrix profiles to extract recurring patterns (motifs) and anomalies (discords) from ECG time series data. It serves as a benchmark for classification models. 2-2D Motifs and Discords (2D-MD): This technique operates directly on ECG images to extract spatial motifs and discords, offering improvements in classification performance over 1D-MD. 3-2D Convolutional Neural Networks (2D-CNN): Pre-trained CNNs like ResNet-50 and VGG16 are used to extract hierarchical features from ECG images, significantly improving classification accuracy when integrated into the HFVR framework. 4-Multi-Frame 2D CNN (MF2D-CNN): Designed for video data, this method processes frames using CNNs and applies temporal aggregation to capture dynamic patterns while maintaining computational efficiency. 5-Spatio-Temporal 3D CNN (ST3D-CNN): Building on MF2D-CNN, this approach uses 3D convolutions to jointly analyze spatial and temporal dynamics in Echo video data. The research also includes the development of a bespoke, multi-modal dataset in collaboration with the Liverpool Heart and Chest Hospital (LHCH), combining ECG, Echo, and clinical data. This dataset was used to evaluate the proposed methods and demonstrate their real-world applicability. The HFVR framework outperformed single-modality approaches by integrating features from multiple data sources. Extensive experiments were conducted on public datasets (CPSC, GHS, GAF, eCAN, and dCAN) and the LHCH dataset. Evaluation metrics such as Accuracy, Precision, Recall, F1-score, and AUC were used, with ten-fold cross-validation and stratified sampling ensuring robustness. Traditional classifiers like Support Vector Machines (SVMs) and k-Nearest Neighbour (kNN) were also used to validate the HFVR framework. Results showed that combining MF2D-CNN with clinical and 2D-CNN features achieved the highest AUC of 93.3%, significantly outperforming baseline methods. Statistical analysis confirmed the robustness and scalability of the techniques. Overall, this thesis advances multi-modal data integration by presenting a unified framework for feature extraction and fusion. The HFVR framework paves the way for holistic machine learning and improved predictive accuracy. While focused on CVD classification, the techniques are generalizable to other domains. Future work will explore real-time implementation, enhanced extraction methods, and expansion to additional data types and domains, representing a major step toward scalable, integrated machine learning systems.
dc.format.extent281
dc.identifier.urihttps://hdl.handle.net/20.500.14154/75599
dc.language.isoen
dc.publisherUniversity of Liverpool
dc.subjectmulti-modal integration
dc.subjectfeature extraction
dc.subjectHFVR
dc.subjectCVD classification
dc.subjectdeep learning
dc.subjectECG
dc.subjectEcho video
dc.subject1D motifs
dc.subject2D CNN
dc.subject3D CNN
dc.titleFeatures Selection Strategies for Classifying Heterogeneous Cardiovascular Disease Data
dc.typeThesis
sdl.degree.departmentComputer Science
sdl.degree.disciplineArtificial Intelligence and Data Science
sdl.degree.grantorUniversity of Liverpool
sdl.degree.namePhD

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
8.27 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025