Distance Based Statistical Methods and Outlier Detection in Large-Scale Electrophysiology Data

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Large-scale electrophysiology experiments produce high-dimensional local field po- tential (LFP) datasets whose size and heterogeneity challenge classical analysis methods. This thesis develops a unified and scalable computational framework for comparing mul- tichannel rodent LFP recordings collected under formalin injection and electrical stimu- lation. We begin by outlining the biological context and formalizing the core research questions in precise mathematical terms. Building on this foundation, we introduce three complementary methodological contributions. First, a window-based fusion framework enables scalable column-wise comparison of large matrices by replacing quadratic-distance computations with segmented, statistically fused evidence. Second, a landmark-based clustering approach provides efficient approximations to pairwise Euclidean distances, with explicit operation-count models and a practical scaling rule that generalizes across synthetic and real data. Third, a row-wise analysis framework based on Elastic-Net PCA and CCA yields low-variance geometric embeddings that support reliable statistical comparison between baseline and post-treatment recordings. To detect perturbation-induced anomalies, we develop the Combined Outlier Score (COS), an ensemble of nine unsupervised detectors that integrates geometric, probabilis- tic, and density-based signals into a unified anomaly measure. Applied to the rodent migraine recovery dataset, the full framework identifies rest intervals that statistically match baseline structure, quantifies deviations following stim- ulation, and reveals interpretable temporal recovery patterns. The results demonstrate that segmentation-based fusion, landmark approximation, row-wise embeddings, and en- semble outlier detection together form a robust and computationally efficient toolkit for analyzing high-dimensional neural data. This thesis provides a coherent methodological foundation for scalable similarity assessment and anomaly detection in large electrophysiology datasets, with applicability to a broad range of big-data time-series domains.

Description

Keywords

Large-scale electrophysiology, Local Field Potentials (LFP), High-dimensional data, Multichannel recordings, Rodent neural data, Formalin injection, Electrical stimulation, Scalable computational framework, Similarity assessment, Window-based fusion, Segmentation-based analysis, Statistical evidence fusion, Landmark-based clustering, Euclidean distance approximation, Computational scalability, Operation-count modeling, Elastic-Net PCA, Canonical Correlation Analysis (CCA), Row-wise analysis, Low-variance embeddings, Baseline vs post-treatment comparison, Anomaly detection, Combined Outlier Score (COS), Ensemble unsupervised detectors, Geometric methods, Probabilistic methods, Density-based methods, Migraine recovery dataset, Temporal recovery patterns, Big-data time series, Neural signal analysis

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2026