Feature Selection for High Dimensional Healthcare Data

Alayed, Abdulrahman

Feature Selection for High Dimensional Healthcare Data

Files

SACM-Dissertation.pdf (1.23 MB)

Date

2024-01

Authors

Alayed, Abdulrahman

Publisher

University of Surrey

Abstract

In today’s digital landscape, researchers frequently encounter the complexity of handling highdimensional datasets. At times, data mining and machine learning methods struggle when confronted with immense datasets, leading to inefficiencies. The presence of extensive raw data with numerous features can negatively impact machine learning algorithms, affecting accuracy, increasing overfitting, and amplifying complexity. This is primarily due to the inclusion of redundant and irrelevant data, which hampers the learning process. However, employing feature selection techniques can effectively address these challenges. By selectively choosing relevant features, these techniques enable machine learning algorithms to operate more efficiently. They contribute to faster training, reduce model complexity, enhance accuracy, and mitigate overfitting issues. The primary objective of this project is to create an automatic variable selection pipeline by choosing the best features among various innovative feature selection techniques. The pipeline incorporates different categories of variable selection methods: Filter methods, Wrapper methods, Embedded methods, and Hybrid Method. The variable selection techniques are applied to the MIMIC-III (Medical Information Mart for Intensive Care) dataset, which is reachable at no cost. This database is well-suited for the project's goals, as it is a centralized database containing details about patients admitted to the critical care unit of a vast regional hospital. The dataset is particularly useful for forecasting the likelihood of death pst-ICU admission during hospital stay. To achieve this goal, the project employs six classification techniques: Logistic Regression (LR), K-nearest Neighbours (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). The project systematically evaluates and compares the model's performance using various assessment metrics.

Keywords

Feature Selection, Data, Artificial Intelligence, Machine Learning, Deep Learning, AI

URI

https://hdl.handle.net/20.500.14154/73158

Collections

SACM - United Kingdom

Full item page

Feature Selection for High Dimensional Healthcare Data

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By