Feature Selection for High Dimensional Healthcare Data

dc.contributor.advisorKouchaki, Samaneh
dc.contributor.authorAlayed, Abdulrahman
dc.date.accessioned2024-10-03T17:37:21Z
dc.date.issued2024-01
dc.description.abstractIn today’s digital landscape, researchers frequently encounter the complexity of handling highdimensional datasets. At times, data mining and machine learning methods struggle when confronted with immense datasets, leading to inefficiencies. The presence of extensive raw data with numerous features can negatively impact machine learning algorithms, affecting accuracy, increasing overfitting, and amplifying complexity. This is primarily due to the inclusion of redundant and irrelevant data, which hampers the learning process. However, employing feature selection techniques can effectively address these challenges. By selectively choosing relevant features, these techniques enable machine learning algorithms to operate more efficiently. They contribute to faster training, reduce model complexity, enhance accuracy, and mitigate overfitting issues. The primary objective of this project is to create an automatic variable selection pipeline by choosing the best features among various innovative feature selection techniques. The pipeline incorporates different categories of variable selection methods: Filter methods, Wrapper methods, Embedded methods, and Hybrid Method. The variable selection techniques are applied to the MIMIC-III (Medical Information Mart for Intensive Care) dataset, which is reachable at no cost. This database is well-suited for the project's goals, as it is a centralized database containing details about patients admitted to the critical care unit of a vast regional hospital. The dataset is particularly useful for forecasting the likelihood of death pst-ICU admission during hospital stay. To achieve this goal, the project employs six classification techniques: Logistic Regression (LR), K-nearest Neighbours (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). The project systematically evaluates and compares the model's performance using various assessment metrics.
dc.format.extent62
dc.identifier.urihttps://hdl.handle.net/20.500.14154/73158
dc.language.isoen
dc.publisherUniversity of Surrey
dc.subjectFeature Selection
dc.subjectData
dc.subjectArtificial Intelligence
dc.subjectMachine Learning
dc.subjectDeep Learning
dc.subjectAI
dc.titleFeature Selection for High Dimensional Healthcare Data
dc.typePostgraduate Projects
sdl.degree.departmentDepartment of Electronic Engineering
sdl.degree.disciplineArtificial Intelligence
sdl.degree.grantorUniversity of Surrey
sdl.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1.23 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2024