Adversary-Aware, Machine Learning-based Detection of Spam in Twitter Hashtags
Abstract
Concerns about the vulnerability of Machine Learning (ML) to adversarial examples in cybersecurity systems have been growing in recent years. These systems are operating in adversarial environments, so any solutions need to consider the presence of adversaries and to evolve over time in the face of emerging threats. However, most of existing ML-based models designed for cybersecurity systems, such as Online Social Networks (OSNs)' spam detection are either adversary-agnostic models or only focus on one aspect of adversarial environments.
The goal of this work is to design adversary-aware ML-based detectors of spam in Twitter considering three key points: the robustness to adversarial examples, adaptability to evolving attacks and interpertability to security analysts. Throughout the thesis, we used health-related spam campaigns in Twitter Arabic hashtags as a case study. The analysis of these campaigns help us to identify three adversarial attacks and develop three adversary-aware ML- and DL-based detectors. The first contribution of this thesis is a taxonomy of potential adversarial attacks scenarios in Twitter. Then, we moved forward to develop an adversary-aware spam detector, which was built on the observation that the targeted campaigns were found to be using unique hijacked accounts to fool the deployed spam detectors. We designed a new feature, which is faster to compute compared to features used in the literature, and which also improves the accuracy of detecting the identified hijacked accounts by 73%. Additionally, we proposed an approach for designing adversary-aware spam image detectors. The key novelty is that our approach improves the robustness through adversarial training and uses black/ white list with human-in-the-loop (HITL) approach to ensure the detectors can evolve over time. The developed adversary-aware Optical Character Recognition (OCR)-based detector outperforms two SOTA OCRs in recognizing Arabic and English text embedded in Twitter spam images. We further propose an OCR post-correction algorithm, which improves the robustness of OCR-based detectors with at least 10% against the generated Adversarial Text Images.