Eric, PardedeSarath, TomyAlotaibi, Obaid Haylan B2026-03-292026https://hdl.handle.net/20.500.14154/78505The Internet of Things (IoT) has delivered significant benefits to various domains such as healthcare, business, and industry by generating vast amounts of data in real time. However, IoT-generated data often suffers from low quality due to issues that can significantly affect data analysis results and lead to inaccurate decision making. Enhancing the quality of real-time data streams has become a challenging task because the characteristics of IoT data make anomaly detection particularly chal lenging, which is crucial for informed decisions. Traditional IoT data cleaning tech niques primarily rely on batch processing methods, which introduce latency and fail to effectively handle real-time streaming IoT data. Many studies have proposed different techniques to overcome these challenges, such as cleaning data in real time; however, no comprehensive data cleaning framework has been proposed. This thesis proposes a comprehensive streaming data cleaning framework aimed at improving the quality of real-time data streams. Central to this framework is a real-time anomaly detection model for structured IoT data streams. The model de tects multiple types of anomalies and classifies them as either significant events or errors. Additionally, the proposed method incorporates context-awareness to further enhance detection reliability. Building upon this detection capability, the framework includes an automated repair system that addresses detected anom alies via multiple repair techniques: delete, replace, or keep, using statistical mea surements and machine learning based on anomaly classification. To enhance user decision-making, the framework integrates large language mod els for data stream cleaning, providing context-aware recommendations and sen sitivity assessments. Large language models operate locally, assisting users to dynamically refine contexts and sensitivity levels based on real-time interaction streams across diverse applications. Overall, this thesis highlights the proposed frameworkâs effectiveness in guiding users by providing a clear picture, thereby enhancing decision-making accuracy in real-time environments and enabling confident, real-time responses to genuine anomalies.158enreal-timedata streamsanomaly detectioncontext-awarenessrule baseddata cleaningdata repairingdata anomalymachine learninghealthcareNear real-timeContext awareness recommendationData sensitivity levelLarge language model (LLM)Real-Time IoT Data Cleaning and Anomaly Detection Using Context-Aware Frameworks and Large Language ModelsThesis