Feature Selection for High Dimensional Healthcare Data
No Thumbnail Available
Date
2024-01
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Surrey
Abstract
In today’s digital landscape, researchers frequently encounter the complexity of handling highdimensional
datasets. At times, data mining and machine learning methods struggle when
confronted with immense datasets, leading to inefficiencies. The presence of extensive raw
data with numerous features can negatively impact machine learning algorithms, affecting
accuracy, increasing overfitting, and amplifying complexity. This is primarily due to the
inclusion of redundant and irrelevant data, which hampers the learning process. However,
employing feature selection techniques can effectively address these challenges. By selectively
choosing relevant features, these techniques enable machine learning algorithms to operate
more efficiently. They contribute to faster training, reduce model complexity, enhance
accuracy, and mitigate overfitting issues. The primary objective of this project is to create an
automatic variable selection pipeline by choosing the best features among various innovative
feature selection techniques. The pipeline incorporates different categories of variable selection
methods: Filter methods, Wrapper methods, Embedded methods, and Hybrid Method. The
variable selection techniques are applied to the MIMIC-III (Medical Information Mart for
Intensive Care) dataset, which is reachable at no cost. This database is well-suited for the
project's goals, as it is a centralized database containing details about patients admitted to the
critical care unit of a vast regional hospital. The dataset is particularly useful for forecasting
the likelihood of death pst-ICU admission during hospital stay. To achieve this goal, the project
employs six classification techniques: Logistic Regression (LR), K-nearest Neighbours
(KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and
Artificial Neural Network (ANN). The project systematically evaluates and compares the
model's performance using various assessment metrics.
Description
Keywords
Feature Selection, Data, Artificial Intelligence, Machine Learning, Deep Learning, AI