DYNAMIC REFINEMENT OF SOCKPUPPET DETECTION MODELS WITH HUMAN-IN-THE-LOOP PROCESSES GUIDED BY MACHINE LEARNING ENGINEERING RULES
| dc.contributor.advisor | Boicu, Mihai | |
| dc.contributor.author | Baamer, Rafeef Abdullah B | |
| dc.date.accessioned | 2026-01-21T07:36:37Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | In recent years, people have increasingly relied on Online Social Networks (OSNs) for various aspects of their daily lives, including communication, information sharing, and entertainment. Although these platforms provide many benefits, their massive and continuous use has also caused negative behaviors and malicious activities. One of the most critical challenges is the growing presence of malicious accounts that undermine the trustworthiness and integrity of online interactions and communication. Such accounts include personal spammers, impersonators, and cyborgs. However, one of the most harmful and complex types is the sockpuppet account. Sockpuppet accounts refer to accounts created by an individual or a coordinated group for deceptive or manipulative purposes, such as spreading misinformation or promoting specific agendas. The term encompasses several subtypes, which are impersonation sockpuppets, fake-profile sockpuppets, promotional or antagonistic sockpuppets, misinformation sockpuppets, troll sockpuppets, and spam sockpuppets. These accounts negatively affect OSNs in multiple ways: they reduce the authenticity and integrity of online communication, degrade information quality by disseminating false or biased content, manipulate public opinion by supporting certain agendas or campaigns, and contribute to community disruption and toxicity through hate speech or coordinated harassment. While prior studies have achieved promising results in sockpuppet account detection, several limitations and research gaps remain. First, most existing approaches focus on identifying a specific type of sockpuppet account—such as spammers or fake reviewers—which limits the generalizability of their models. Second, only a few studies have explored or implemented hybrid detection techniques, as most rely on a single methodological approach. Third, many models are tested on a single platform or dataset, which restricts their scalability and cross-platform applicability. Moreover, no prior research has proposed a detection model specifically designed for Arabic sockpuppet accounts. Finally, there has been limited involvement of human expertise and underutilization of Human-in-the-Loop (HITL) analysis in refining and validating detection outcomes. To address these limitations, this dissertation presents three major experiments conducted across Wikipedia, Reddit, and X/Twitter platforms, targeting different categories of sockpuppets—general, troll, and spammer accounts. In these three experiments, various detection approaches were employed, including individual machine learning classifiers, ensemble voting, deep learning, and transformer-based (AraBERT) models, to detect and classify sockpuppet accounts across multiple platforms. These models were subsequently integrated into a Human-in-the-Loop analysis framework to enhance their performance through multiple refinement cycles, identifying and applying machine learning engineering rules (MLE), e.g., mixed-initiative feature optimization, data improvement, and hyperparameter tuning of classifiers. The process involved iterative model tuning and evaluation, resulting in the formulation of MLE rules derived from both model insights and human feedback. This research yields several contributions: it developed generalizable hybrid detection techniques that increased the performance in sockpuppet accounts detection (as measured by accuracy, precision, recall, and F-Score); second, it introduced a validation process for sockpuppet datasets combining transformer-based model for posts labeling and Human-in-the-Loop analysis and review which also resulted in the first Arabic labeled sockpuppet accounts dataset, addressing a major linguistic and cultural gap in existing research; third, it established a systematic approach for identifying borderline cases that require human review and translating these insights into model-refinement and MLE rules to enhance overall detection performance and generalizability; finally, it developed a Human-in-the-Loop process for analysts for model development and dynamic refinement that was tested across multiple datasets representing diverse online platforms and different types of sockpuppet accounts. | |
| dc.format.extent | 329 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14154/77991 | |
| dc.language.iso | en_US | |
| dc.publisher | Saudi Digital Library | |
| dc.subject | sockpuppet | |
| dc.subject | troll | |
| dc.subject | spam | |
| dc.subject | machine learning | |
| dc.subject | human-in-the-loop | |
| dc.subject | transformer-based model | |
| dc.subject | DNN | |
| dc.subject | ||
| dc.subject | Wikipedia | |
| dc.subject | ||
| dc.title | DYNAMIC REFINEMENT OF SOCKPUPPET DETECTION MODELS WITH HUMAN-IN-THE-LOOP PROCESSES GUIDED BY MACHINE LEARNING ENGINEERING RULES | |
| dc.type | Thesis | |
| sdl.degree.department | Information Technology- Cyber Security | |
| sdl.degree.discipline | Machine Learning in Cyber Security | |
| sdl.degree.grantor | George Mason University | |
| sdl.degree.name | Doctor of Philosophy |
