Distance Based Statistical Methods and Outlier Detection in Large-Scale Electrophysiology Data
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Large-scale electrophysiology experiments produce high-dimensional local field po-
tential (LFP) datasets whose size and heterogeneity challenge classical analysis methods.
This thesis develops a unified and scalable computational framework for comparing mul-
tichannel rodent LFP recordings collected under formalin injection and electrical stimu-
lation.
We begin by outlining the biological context and formalizing the core research
questions in precise mathematical terms. Building on this foundation, we introduce three
complementary methodological contributions. First, a window-based fusion framework
enables scalable column-wise comparison of large matrices by replacing quadratic-distance
computations with segmented, statistically fused evidence. Second, a landmark-based
clustering approach provides efficient approximations to pairwise Euclidean distances,
with explicit operation-count models and a practical scaling rule that generalizes across
synthetic and real data. Third, a row-wise analysis framework based on Elastic-Net
PCA and CCA yields low-variance geometric embeddings that support reliable statistical
comparison between baseline and post-treatment recordings.
To detect perturbation-induced anomalies, we develop the Combined Outlier Score
(COS), an ensemble of nine unsupervised detectors that integrates geometric, probabilis-
tic, and density-based signals into a unified anomaly measure.
Applied to the rodent migraine recovery dataset, the full framework identifies rest
intervals that statistically match baseline structure, quantifies deviations following stim-
ulation, and reveals interpretable temporal recovery patterns. The results demonstrate
that segmentation-based fusion, landmark approximation, row-wise embeddings, and en-
semble outlier detection together form a robust and computationally efficient toolkit for
analyzing high-dimensional neural data.
This thesis provides a coherent methodological foundation for scalable similarity
assessment and anomaly detection in large electrophysiology datasets, with applicability
to a broad range of big-data time-series domains.
Description
Keywords
Large-scale electrophysiology, Local Field Potentials (LFP), High-dimensional data, Multichannel recordings, Rodent neural data, Formalin injection, Electrical stimulation, Scalable computational framework, Similarity assessment, Window-based fusion, Segmentation-based analysis, Statistical evidence fusion, Landmark-based clustering, Euclidean distance approximation, Computational scalability, Operation-count modeling, Elastic-Net PCA, Canonical Correlation Analysis (CCA), Row-wise analysis, Low-variance embeddings, Baseline vs post-treatment comparison, Anomaly detection, Combined Outlier Score (COS), Ensemble unsupervised detectors, Geometric methods, Probabilistic methods, Density-based methods, Migraine recovery dataset, Temporal recovery patterns, Big-data time series, Neural signal analysis
