Interactive Visualization Adopting Dimensionality Reduction Techniques for Pattern Recognition in Large Temporal Datasets
Abstract
In this thesis, we focus on time-series data, which is commonly used by domain experts in different domains
to explore and understand phenomena or behaviors under consideration, assisting them in making
decisions, predicting the future or solving problems. Utilizing sensor devices is one of the common ways of
collecting time-series data. These devices collect large volumes of raw data, including multi-dimensional
time-series data, and each value is associated with the time-stamp corresponding to when it was recorded.
However, finding interesting patterns or behaviors in a large amount of data is not simple due to the nature
of the data and other challenges related to its size and scalability, high dimensionality, complexity,
representation, and unique structure.
Researchers tend to use time-series chart visualization, which is usually unsuitable because of the small
screen resolution which cannot accommodate the large size of the data. Hence, occlusion and overplotting
issues occur, limiting or complicating the exploration and analysis tasks. Another challenge concerns the
labeling of patterns in large time-series data, which is time-consuming and requires a great deal of expert
knowledge.
These issues are addressed in this thesis to improve the exploration, analysis and presentation of timeseries data and enable users to gain insights into large and multi-dimensional time-series datasets using a
combination of dimensionality reduction techniques and interactive visual methods. The provided solutions
will help researchers from various domains who deal with large and multi-dimensional time-series data to
efficiently explore and analyze such data with little effort and in record time.
Initially, we explore the area of integration between machine learning algorithms and interactive
visualization techniques for exploring and understanding time-series data, specifically looking at clustering
and classification for time-series data in visual analytics. The survey is considered to be a valuable guide for
both new researchers and experts in the emerging field of integrating machine learning algorithms into
visual analytics.
Next, we present a novel approach that aims to explore, analyze, and present large temporal datasets
through one image. The proposed approach uses a sliding window and dimensionality reduction techniques
to depict a large time-series data as points into a 2D scatter plot. The approach provides novel solutions to
many pattern discovery issues and can deal with both univariate and multivariate time-series data.
Following this, our proposed approach is combined with both visualization and interaction techniques into
one system called TimeCluster, which is a visual analytics tool allowing users to visualize, explore and
interact with large time-series data. The system addresses different issues such as anomaly detection, the
discovery of frequent patterns, and the labeling of interesting patterns in large time-series data all in a single
system. We deploy our system with different time-series datasets and report real-world case studies of its
utility.
Later, the linkage between the 1D view (time-series chart) to the 2D view of the 2D embedding of timeseries data, and parallel interactions such as selection and labeling, are employed to explore and examine
the effectiveness of recent developments in machine learning and dimension reduction in the context of
time-series data exploration. We design a user study to evaluate and validate the effectiveness of the linkage
between both a 1D and 2D visualization, and how their fitness in the context of projecting time-series data
is, where different dimensionality reduction techniques are examined, evaluated and compared within our
experimental setting.
Lastly, we conclude our findings and outline possible areas for future work.