Feature extraction for high dimensional healthcare data

Thumbnail Image

Date

2024-02-19

Journal Title

Journal ISSN

Volume Title

Publisher

University of Surrey

Abstract

ABSTRACT In the contemporary era of digital technology, the healthcare sector is faced with an abun-dance of huge databases, mostly due to the widespread adoption of machine learning and data mining methodologies. Nevertheless, the substantial complexity of large datasets pre-sents notable obstacles, such as the predicament known as the 'curse of dimensionality'. The primary objective of this project is to tackle these issues by formulating methodologies that enable the automated extraction of characteristics from complex Intensive Care Unit (ICU) data, which consists of numerous dimensions. The ultimate aim is to utilise these methodol-ogies to anticipate the likelihood of in-hospital death following admission to the ICU. The utilises a variety of advanced feature extraction methods, encompassing both linear and nonlinear approaches such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), t-Distributed Stochastic Neighbour Embedding (t-SNE), and Autoencod-ers. The aforementioned methodologies are employed on the MIMIC III dataset, encompassing data pertaining to a population of around fifty-one thousand patients. Every patient can be identified by their distinct admission identification number. The primary objective of this study is to assess methodologies for the automated extraction of features that can be subsequently employed in healthcare applications. The study addi-tionally investigates the potential of employing more sophisticated and advanced machine learning models, such as deep learning models, to effectively capture intricate patterns and relationships within the data characterised by a high number of dimensions. Further could explore the practical application of these extracted traits in real-world healthcare contexts, perhaps resulting in the development of more precise and efficient predictive models and enhanced patient outcomes. This study makes a valuable contribution to the domain of machine learning in the healthcare sector, with a specific focus on the automated extraction of features from complex datasets to predict in-hospital mortality. The results of this study have the potential to contribute to the progress of data-driven solutions in the field of healthcare.

Description

Keywords

Feature extraction, high dimensional, data

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025