Real-Time IoT Data Cleaning and Anomaly Detection Using Context-Aware Frameworks and Large Language Models
No Thumbnail Available
Date
2026
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
The Internet of Things (IoT) has delivered significant benefits to various domains
such as healthcare, business, and industry by generating vast amounts of data in
real time. However, IoT-generated data often suffers from low quality due to issues
that can significantly affect data analysis results and lead to inaccurate decision
making.
Enhancing the quality of real-time data streams has become a challenging task
because the characteristics of IoT data make anomaly detection particularly chal
lenging, which is crucial for informed decisions. Traditional IoT data cleaning tech
niques primarily rely on batch processing methods, which introduce latency and
fail to effectively handle real-time streaming IoT data. Many studies have proposed
different techniques to overcome these challenges, such as cleaning data in real
time; however, no comprehensive data cleaning framework has been proposed.
This thesis proposes a comprehensive streaming data cleaning framework aimed
at improving the quality of real-time data streams. Central to this framework is a
real-time anomaly detection model for structured IoT data streams. The model de
tects multiple types of anomalies and classifies them as either significant events
or errors. Additionally, the proposed method incorporates context-awareness to
further enhance detection reliability. Building upon this detection capability, the
framework includes an automated repair system that addresses detected anom
alies via multiple repair techniques: delete, replace, or keep, using statistical mea
surements and machine learning based on anomaly classification.
To enhance user decision-making, the framework integrates large language mod
els for data stream cleaning, providing context-aware recommendations and sen
sitivity assessments. Large language models operate locally, assisting users to
dynamically refine contexts and sensitivity levels based on real-time interaction
streams across diverse applications.
Overall, this thesis highlights the proposed framework’s effectiveness in guiding
users by providing a clear picture, thereby enhancing decision-making accuracy in
real-time environments and enabling confident, real-time responses to genuine
anomalies.
Description
Keywords
real-time, data streams, anomaly detection, context-awareness, rule based, data cleaning, data repairing, data anomaly, machine learning, healthcare, Near real-time, Context awareness recommendation, Data sensitivity level, Large language model (LLM)
