Exploring Ridgeless Regression in High-Dimensional Data: A Numerical Investigation into Predictive Accuracy
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Nottingham
Abstract
The rise of high-dimensional datasets, where the number of predictors p exceeds the number
of observations n, comes with significant challenges for linear models with the Ordinary
Least Squares (OLS) method. This report investigates the application of ridgeless regression,
an OLS method with a minimum-norm solution, in such high-dimensional settings, particularly
when p ≫ n. The minimum-norm OLS is compared against ridge regression in terms of
predictive accuracy in high-dimensional settings.
Using simulation studies on the spiked covariance model, this report shows that the
minimum-norm OLS can outperform ridge regression under certain high-dimensional datasets
where p ≫ n, contradicting the traditional assumptions that regularization techniques are
necessary in high-dimensional settings. Moreover, this report shows that the optimal regularization
parameter λ in ridge regression can be negative in such cases, challenging the
conventional belief that the regularization parameter λ is always positive. This is due to the
inherent structure of the data, which may provide sufficient implicit regularization, making
additional penalization unnecessary or even counterproductive.
The implications of these findings extend to practical applications in fields such as genomics
and finance, where high-dimensional data is common. The conclusions drawn from this work
highlight the potential of ridgeless regression as a viable alternative to ridge regression in high-dimensional
data, especially when traditional methods encounter issues like overfitting. The
report contributes to the ongoing discussion in statistical machine learning by providing new
insights into when and why ridgeless regression may be preferred.
Description
Keywords
machine learning, high-dimensional data, ridge regression, ridgeless regression, Ordinary Least Squares, minimum-norm solution