On Dimension Reduction and Variable Selection for High Dimensional Data
Date
2024-06-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
The University of Alabama
Abstract
Due to recent advancements in technologies and computing abilities, collecting and stor- ing data with thousands of features for each observation has become abundant, resulting in what is known as high-dimensional data. One major impact of high-dimensional data is the phenomenon known as the ’curse of dimensionality.’ Sufficient dimension reduction (SDR) techniques and variable selection have become crucial tools in parametric and non- parametric modeling in recent years. Extracting useful information from high-dimensional data is a challenging task. SDR techniques and variable selection methods aim to reduce the complexity of the data to facilitate decison making tools such as visualization, statistical modeling, and inferences. In this dissertation, we propose two methods for variable selection and sufficient dimension reduction, respectively.
First, we develop a shrinkage estimation for varying coefficient models for panel data with separable and non-separable fixed effects. The Kernel Least Absolute Shrinkage and Selection Operator (KLASSO) [53] has been modified, enabling our proposed method to select the relevant features with their gradients while simultaneously identifying the correct non- separable individual fixed effects alongside the separable fixed effects. The proposed estimation method demonstrates a high accuracy rate, as shown in the simulation studies.
Second, motivated by the work of [54] and the novel work of [27] which paved the road for new SDR methods, we embedded the elasticnet penalty with the Principal Support Vector Machine (PSVM) for dimension reduction. The proposed method has the ability to select and shrink the coefficients in the prime axes and then find a projection into a lower subspace while maintaining the useful information about the central subspace Sy|x of the regression model. The finite sample studies show significant improvements compared with the PSVM.
Description
Keywords
Sufficient Dimension Reduction (SDR), Variable Selection (VS), Support Vector Machine (SVM)