Automatic Detection and Verification System for Arabic Rumor News on Twitter

Chin-Teng, LinKarali, Sami2024-12-032026-04Karali, S. (2024). Automatic detection and verification system for Arabic rumor news on Twitter (PhD thesis). University of Technology Sydney.https://hdl.handle.net/20.500.14154/73978Language models have been extensively studied and applied in various fields in recent years. However, the majority of the language use models are designed for and perform significantly better in English compared to other languages, such as Arabic. The differences between English and Arabic in terms of grammar, writing, and word-forming structures pose significant challenges in applying English-based language models to Arabic content. Therefore, there is a critical need to develop and refine models and methodologies that can effectively process Arabic content. This research aims to address the gaps in Arabic language models by developing innovative machine learning (ML) and natural language processing (NLP) methodologies. We apply the developed model to Arabic rumor detection on Twitter to test its effectiveness. To achieve this, the research is divided into three fundamental phases: 1) Efficiently collecting and pre-processing a comprehensive dataset of Arabic news tweets; 2) The refinement of ML models through an enhanced Convolutional Neural Network (ECNN) equipped with N-gram feature maps for accurate rumor identification; 3) The augmentation of decision-making precision in rumor verification via sophisticated ensemble learning techniques. In the first phase, the research meticulously develops a methodology for the collection and pre-processing of Arabic news tweets, aiming to establish a dataset optimized for rumor detection analysis. Leveraging a blend of automated and manual processes, the research navigates the intricacies of the Arabic language, enhancing the dataset’s quality for ML applications. This foundational phase ensures removing irrelevant data and normalizing text, setting a precedent for accuracy in subsequent detection tasks. The second phase is to develop an Enhanced Convolutional Neural Network (ECNN) model, which incorporates N-gram feature maps for a deeper linguistic analysis of tweets. This innovative ECNN model, designed specifically for the Arabic language, marks a significant departure from traditional rumor detection models by harnessing the power of spatial feature extraction alongside the contextual insights provided by N-gram analysis. Empirical results underscore the ECNN model’s superior performance, demonstrating a marked improvement in detecting and classifying rumors with heightened accuracy and efficiency. The culmination of the study explores the efficacy of ensemble learning methods in enhancing the robustness and accuracy of rumor detection systems. By synergizing the ECNN model with Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), and Gated Recurrent Unit (GRU) networks within a stacked ensemble framework, the research pioneers a composite approach that significantly outstrips the capabilities of singular models. This innovation results in a state-of-the-art system for rumor verification that outperforms accuracy in identifying rumors, as demonstrated by empirical testing and analysis. This research contributes to bridging the gap between English-centric language models and Arabic language processing, demonstrating the importance of tailored approaches for different languages in the field of ML and NLP. These contributions signify a monumental step forward in the field of Arabic NLP and ML and offer practical solutions for the real-world challenge of rumor proliferation on social media platforms, ultimately fostering a more reliable digital environment for Arabic-speaking communities.104enArabic NewsNLPDeep LearningMachine LearningTwitterAutomatic Detection and Verification System for Arabic Rumor News on TwitterThesis