Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 1 of 1
  • ItemRestricted
    Feature Selection for High Dimensional Healthcare Data
    (University of Surrey, 2024-01) Alayed, Abdulrahman; Kouchaki, Samaneh
    In today’s digital landscape, researchers frequently encounter the complexity of handling highdimensional datasets. At times, data mining and machine learning methods struggle when confronted with immense datasets, leading to inefficiencies. The presence of extensive raw data with numerous features can negatively impact machine learning algorithms, affecting accuracy, increasing overfitting, and amplifying complexity. This is primarily due to the inclusion of redundant and irrelevant data, which hampers the learning process. However, employing feature selection techniques can effectively address these challenges. By selectively choosing relevant features, these techniques enable machine learning algorithms to operate more efficiently. They contribute to faster training, reduce model complexity, enhance accuracy, and mitigate overfitting issues. The primary objective of this project is to create an automatic variable selection pipeline by choosing the best features among various innovative feature selection techniques. The pipeline incorporates different categories of variable selection methods: Filter methods, Wrapper methods, Embedded methods, and Hybrid Method. The variable selection techniques are applied to the MIMIC-III (Medical Information Mart for Intensive Care) dataset, which is reachable at no cost. This database is well-suited for the project's goals, as it is a centralized database containing details about patients admitted to the critical care unit of a vast regional hospital. The dataset is particularly useful for forecasting the likelihood of death pst-ICU admission during hospital stay. To achieve this goal, the project employs six classification techniques: Logistic Regression (LR), K-nearest Neighbours (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). The project systematically evaluates and compares the model's performance using various assessment metrics.
    34 0

Copyright owned by the Saudi Digital Library (SDL) © 2024