Wei, Guo-WeiSuwayyid, Faisal2025-12-022025https://hdl.handle.net/20.500.14154/77283This dissertation advances algebraic and topological methods for data science through four lines of work. The first part introduces the path Dirac and hypergraph Dirac operators together with their persistent counterparts, and investigates their ability to capture harmonic and non-harmonic spectra while revealing informative subcomplex structure. Their sensitivity to filtration is analyzed, demonstrating how these operators adapt to topological changes, and their behavior is illustrated across diverse examples. A central application is to molecular science: strict preorders derived from molecular structure generate graphs and digraphs with rich path architecture, and the resulting path complexes encode information depth that varies with the underlying preorder classes. The second part develops Mayer Dirac operators on \(N\)-chain complexes. These operators link an alternating sequence of Mayer Laplacians and generalize the classical identity \(D^{2}=L\). An explicit Laplacian for \(N\)-chain complexes induced by vertex sequences on finite sets is derived, and weighted Mayer Laplacian and Dirac operators are introduced to capture physical attributes more effectively. A generalized factorization of Laplacians as an operator product with its adjoint is also established. Persistent Mayer Dirac operators and extensions are applied to biological and chemical data, where they demonstrate practical utility. The third part establishes a persistent Stanley–Reisner theory that connects commutative algebra with combinatorial algebraic topology, machine learning, and data science. The framework defines persistent \(h\)-vectors, persistent \(f\)-vectors, persistent graded Betti numbers, persistent facet ideals, and facet persistence modules. Stability analysis confirms that these algebraic invariants are robust under geometric perturbations, and their predictive value is demonstrated on molecular datasets. The final part proposes Commutative Algebra k-mer Learning (CAKL), a nonlinear algebraic framework for comparative genomics that builds upon persistent Stanley–Reisner theory. CAKL integrates commutative algebra, algebraic topology, combinatorics, and machine learning to address genetic variant identification, phylogenetic tree inference, and viral genome classification. Across eleven datasets, CAKL outperforms five state-of-the-art sequence analysis methods—particularly in viral classification—and maintains stable predictive accuracy as dataset size increases, highlighting scalability and robustness. Collectively, these contributions provide new operators, invariants, and learning paradigms that unify algebraic, topological, and combinatorial perspectives on discrete structures and real-world data, yielding great performance in molecular science and genomics.198en-USPersistent TheoryTopological Data AnalysisASPECTS OF COMBINATORIAL SPECTRAL THEORY AND COMMUTATIVE ALGEBRAThesis