Mahmood, AusifAlharthi, Musleh2025-11-162025https://hdl.handle.net/20.500.14154/77000Time series forecasting has long been a challenging area in the field of Artificial Intelligence, with various approaches such as linear neural networks, recurrent neural networks, Convolutional Neural Networks, and transformers being explored. Despite their remarkable success in Natural Language Processing, transformers have faced mixed outcomes in the time series domain, particularly in long-term time series forecasting (LTSF). Recent works have demonstrated that simple linear models, such as LTSF- Linear, often outperform transformer-based architectures, leading to a reexamination of the transformer’s effectiveness in this area. In this paper, we investigate this paradox by comparing linear neural network and transformer-based designs, offering insights into why certain models may excel in specific problem settings. Additionally, we enhance a simple linear neural network architecture using dual pipelines with batch normalization and reversible instance normalization, surpassing all existing models on most popular benchmarks. Furthermore, we introduce an adaptation of the extended LSTM (xLSTM) architecture, named xLSTMTime, which incorporates exponential gating and a revised memory structure to handle multivariate LTSF more effectively. Our empirical evaluations demonstrate that xLSTMTime achieves superior performance compared to various state-of-the-art models, suggesting that refined recurrent architectures may present a competitive alternative to transformer-based designs for LTSF tasks. More recently, TimeLLM demonstrated even better results by reprogramming i.e., repurposing a Large Language Model (LLM) for the TSF domain. Again, a follow up paper challenged this by demonstrating that removing the LLM component or replacing it with a basic attention layer in fact yields better performance. One of the challenges in forecasting is the fact that TSF data favors the more recent past, and is sometimes subject to unpredictable events. Based upon these recent insights in TSF, we propose a Mixture of Experts (MoE) framework. Our method combines the state-of-the-art (SOTA) models including xLSTM, enhanced Linear, PatchTST, minGRU among others. This set of complimentary and diverse models for TSF are integrated in a Transformer MoE model. Our results on standard TSF benchmarks demonstrate better results surpassing all current TSF models, including those based on recent MoE frameworks.92en-USDeep LearningUSING AI DEEP LEARNING MODELS FOR IMPROVED LONG-SHORT TERM TIME SERIES FORECASTINGThesis