Distance Based Statistical Methods and Outlier Detection in Large-Scale Electrophysiology Data

Maia, PedroAsiri, Zahra2025-12-312025https://hdl.handle.net/20.500.14154/77762Large-scale electrophysiology experiments produce high-dimensional local field po- tential (LFP) datasets whose size and heterogeneity challenge classical analysis methods. This thesis develops a unified and scalable computational framework for comparing mul- tichannel rodent LFP recordings collected under formalin injection and electrical stimu- lation. We begin by outlining the biological context and formalizing the core research questions in precise mathematical terms. Building on this foundation, we introduce three complementary methodological contributions. First, a window-based fusion framework enables scalable column-wise comparison of large matrices by replacing quadratic-distance computations with segmented, statistically fused evidence. Second, a landmark-based clustering approach provides eﬃcient approximations to pairwise Euclidean distances, with explicit operation-count models and a practical scaling rule that generalizes across synthetic and real data. Third, a row-wise analysis framework based on Elastic-Net PCA and CCA yields low-variance geometric embeddings that support reliable statistical comparison between baseline and post-treatment recordings. To detect perturbation-induced anomalies, we develop the Combined Outlier Score (COS), an ensemble of nine unsupervised detectors that integrates geometric, probabilis- tic, and density-based signals into a unified anomaly measure. Applied to the rodent migraine recovery dataset, the full framework identifies rest intervals that statistically match baseline structure, quantifies deviations following stim- ulation, and reveals interpretable temporal recovery patterns. The results demonstrate that segmentation-based fusion, landmark approximation, row-wise embeddings, and en- semble outlier detection together form a robust and computationally eﬃcient toolkit for analyzing high-dimensional neural data. This thesis provides a coherent methodological foundation for scalable similarity assessment and anomaly detection in large electrophysiology datasets, with applicability to a broad range of big-data time-series domains.164enLarge-scale electrophysiologyLocal Field Potentials (LFP)High-dimensional dataMultichannel recordingsRodent neural dataFormalin injectionElectrical stimulationScalable computational frameworkSimilarity assessmentWindow-based fusionSegmentation-based analysisStatistical evidence fusionLandmark-based clusteringEuclidean distance approximationComputational scalabilityOperation-count modelingElastic-Net PCACanonical Correlation Analysis (CCA)Row-wise analysisLow-variance embeddingsBaseline vs post-treatment comparisonAnomaly detectionCombined Outlier Score (COS)Ensemble unsupervised detectorsGeometric methodsProbabilistic methodsDensity-based methodsMigraine recovery datasetTemporal recovery patternsBig-data time seriesNeural signal analysisDistance Based Statistical Methods and Outlier Detection in Large-Scale Electrophysiology DataThesis