On Dimension Reduction and Variable Selection for High Dimensional Data

Alzahrani, Mohammed

On Dimension Reduction and Variable Selection for High Dimensional Data

Date

2024-06-06

Authors

Alzahrani, Mohammed

Publisher

The University of Alabama

Abstract

Due to recent advancements in technologies and computing abilities, collecting and stor- ing data with thousands of features for each observation has become abundant, resulting in what is known as high-dimensional data. One major impact of high-dimensional data is the phenomenon known as the ’curse of dimensionality.’ Sufficient dimension reduction (SDR) techniques and variable selection have become crucial tools in parametric and non- parametric modeling in recent years. Extracting useful information from high-dimensional data is a challenging task. SDR techniques and variable selection methods aim to reduce the complexity of the data to facilitate decison making tools such as visualization, statistical modeling, and inferences. In this dissertation, we propose two methods for variable selection and sufficient dimension reduction, respectively. First, we develop a shrinkage estimation for varying coefficient models for panel data with separable and non-separable fixed effects. The Kernel Least Absolute Shrinkage and Selection Operator (KLASSO) [53] has been modified, enabling our proposed method to select the relevant features with their gradients while simultaneously identifying the correct non- separable individual fixed effects alongside the separable fixed effects. The proposed estimation method demonstrates a high accuracy rate, as shown in the simulation studies. Second, motivated by the work of [54] and the novel work of [27] which paved the road for new SDR methods, we embedded the elasticnet penalty with the Principal Support Vector Machine (PSVM) for dimension reduction. The proposed method has the ability to select and shrink the coefficients in the prime axes and then find a projection into a lower subspace while maintaining the useful information about the central subspace Sy|x of the regression model. The finite sample studies show significant improvements compared with the PSVM.

Keywords

Sufficient Dimension Reduction (SDR), Variable Selection (VS), Support Vector Machine (SVM)

URI

https://hdl.handle.net/20.500.14154/72750

Collections

SACM - United States of America

Full item page

On Dimension Reduction and Variable Selection for High Dimensional Data

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By