A Genetic Algorithm-based Local Outlier Factor for Efficient Big Data Stream Processing

No Thumbnail Available

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Interest in outlier detection methods is increasing because detecting outliers is an important operation for many applications such as detecting fraud transactions in credit card, network intrusion detection and data analysis in different domains. We are now in the big data era, and an important type of big data is data stream. With the increasing necessity for analyzing high-velocity data streams, it becomes difficult to apply older outlier detection methods efficiently. Local Outlier Factor (LOF) is a well-known outlier algorithm. A major challenge of LOF is that it requires the entire dataset and the distance values to be stored in memory. Another issue with LOF is that it needs to be recalculated from the beginning if any change occurs in the dataset. This research proposes a novel local outlier detection algorithm for data streams, called Genetic-based Incremental Local Outlier Factor (GILOF). Moreover, we further improved the GILOF performance in data streams by proposing a new calculation method for LOF, called Local Outlier Factor by Reachability distance (LOFR). The improved algorithm for local outlier detection in data stream is called the Genetic-based Incremental Local Outlier Factor by Reachability distance (GILOFR). The GILOF and GILOFR algorithms work without any previous knowledge of data distribution, and they are able to execute in limited memory. The outcomes of our experiments with various real-world datasets demonstrate that the proposed algorithms have very good performance in execution time and accuracy of outlier detection.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025