Preston, SimonAlderaan, Saad2024-11-172024https://hdl.handle.net/20.500.14154/73620The rise of high-dimensional datasets, where the number of predictors p exceeds the number of observations n, comes with significant challenges for linear models with the Ordinary Least Squares (OLS) method. This report investigates the application of ridgeless regression, an OLS method with a minimum-norm solution, in such high-dimensional settings, particularly when p ≫ n. The minimum-norm OLS is compared against ridge regression in terms of predictive accuracy in high-dimensional settings. Using simulation studies on the spiked covariance model, this report shows that the minimum-norm OLS can outperform ridge regression under certain high-dimensional datasets where p ≫ n, contradicting the traditional assumptions that regularization techniques are necessary in high-dimensional settings. Moreover, this report shows that the optimal regularization parameter λ in ridge regression can be negative in such cases, challenging the conventional belief that the regularization parameter λ is always positive. This is due to the inherent structure of the data, which may provide sufficient implicit regularization, making additional penalization unnecessary or even counterproductive. The implications of these findings extend to practical applications in fields such as genomics and finance, where high-dimensional data is common. The conclusions drawn from this work highlight the potential of ridgeless regression as a viable alternative to ridge regression in high-dimensional data, especially when traditional methods encounter issues like overfitting. The report contributes to the ongoing discussion in statistical machine learning by providing new insights into when and why ridgeless regression may be preferred.60enmachine learninghigh-dimensional dataridge regressionridgeless regressionOrdinary Least Squaresminimum-norm solutionExploring Ridgeless Regression in High-Dimensional Data: A Numerical Investigation into Predictive AccuracyThesis