A Genetic Algorithm-based Local Outlier Factor for Efficient Big Data Stream Processing
No Thumbnail Available
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Interest in outlier detection methods is increasing because detecting outliers is an important operation
for many applications such as detecting fraud transactions in credit card, network intrusion detection
and data analysis in different domains. We are now in the big data era, and an important type of big data
is data stream. With the increasing necessity for analyzing high-velocity data streams, it becomes
difficult to apply older outlier detection methods efficiently. Local Outlier Factor (LOF) is a well-known
outlier algorithm. A major challenge of LOF is that it requires the entire dataset and the distance values
to be stored in memory. Another issue with LOF is that it needs to be recalculated from the beginning
if any change occurs in the dataset. This research proposes a novel local outlier detection algorithm for
data streams, called Genetic-based Incremental Local Outlier Factor (GILOF). Moreover, we further
improved the GILOF performance in data streams by proposing a new calculation method for LOF,
called Local Outlier Factor by Reachability distance (LOFR). The improved algorithm for local outlier
detection in data stream is called the Genetic-based Incremental Local Outlier Factor by Reachability
distance (GILOFR). The GILOF and GILOFR algorithms work without any previous knowledge of data
distribution, and they are able to execute in limited memory. The outcomes of our experiments with
various real-world datasets demonstrate that the proposed algorithms have very good performance in
execution time and accuracy of outlier detection.