Identity Cloning Detection in Social Sensor-Clouds

Thumbnail Image

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

Social sensing is a paradigm that enables the crowdsourcing of data from humans and devices. This sensed data (e.g., social network posts) can be hosted in social-sensor clouds (i.e., social networks) and delivered as social-sensor cloud services (SocSen services). These services can be identified by their providers' social network accounts. Attackers breach the security of social-sensor clouds by cloning the user profiles of SocSen service providers in order to deceive the users of social-sensor services cloud. The existing research on cloned identity detection still has several major limitations. First, prior studies focused mainly on privacy-sensitive user profile data (e.g., full name, date of birth) and non-privacy-sensitive user profile data (e.g., account description, posts). Privacy-sensitive user profile data cannot be accessed by third-party applications via Application Programming Interfaces (APIs) or other means due to privacy safeguards. Second, current approaches apply simple feature similarity methods or supervised machine learning to detect cloned identities. Simple feature similarity usually involves human-defined metrics to determine the similarities in profile attributes, focusing only on word frequency or the distance between characters. On the other hand, approaches based on supervised machine-learning require labelled data samples for predictive model training. Furthermore, the existing approaches used to detect identity cloning rely on complete SocSen service provider (i.e., social media user) profile data. The performance of these approaches often depends on the availability of comprehensive information in social media profiles. Finally, prior research on cloned identity detection has focused only on detecting similar identities and assumes that all of them are cloned. Thus, to date, no solutions have been proposed for the identification of duplicated accounts. In this thesis, we address the aforementioned limitations and make the following contributions. First, we propose a novel approach for the unsupervised detection of identity cloning in a SocSen service provider. The proposed approach is intended for third-party applications/websites which contain only non-privacy-sensitive user profile data accessed via social media APIs. We devise a multi-view account representation model that generates different views for each account in an account pair comprising three categories of views, namely, 1) post view, 2) network view and 3) profile attribute view. We then adopt Weighted Generalized Canonical Correlation Analysis (wGCCA) to learn a single embedding from the generated multi-view. Finally, we calculate the cosine similarity between the account pair. The evaluation shows that the proposed approach performs well when applied to a real-world Twitter dataset. Second, we propose two prediction models: 1) supervised and 2) weakly-supervised machine learning models. Firstly, both proposed models leverage non-privacy-sensitive user profile data gathered from social networks and a powerful deep learning model to detect cloned identities. The weakly-supervised model is trained using the predicted labels from the supervised model. Using a real-world dataset, we evaluated the performance of our model and compared it with those of state-of-the-art detection techniques and other popular models used to detect identity deception. The results show that our method significantly outperforms these techniques/models in terms of Precision and F1-score. Third, we propose a novel approach for detecting SocSen service provider identity cloning when profile data is incomplete. The proposed approach is especially designed to detect cloned identities based on non-privacy-sensitive profiles. The proposed approach extracts profile and Weighted Generalized Canonical Correlation Analysis (WGCCA)-based features, which can potentially contain missing values. To counter the impact of such missing values, a missing value imputer will next impute the missing values of the aforementioned extracted features. After that, the proposed approach further extracts two categories of augmented features for each account pair identified previously, namely, 1) similarity and 2) differences-based features. Finally, these features are concatenated and fed into a Light Gradient Boosting Machine classifier to detect identity cloning. The experimental results show that the proposed approach outperforms the state-of-the-art approaches and models in terms of Precision, Recall and F1-score. Finally, we design a cryptography-based authentication protocol to determine whether or not a pair of identified similar accounts contain a cloned account. This protocol requests the newer account from an account pair to decrypt two random messages respectively encrypted by the newer and older accounts' public keys for authentication. This is expected to identify the most common noises during cloned account detection, which occur when user creates duplicate accounts on a social media platform. In summary, this thesis makes several contributions to the field of identity cloning in social sensor-clouds. The results show that the proposed approaches significantly outperform the state-of-the-art identity cloning detection techniques.

Description

Keywords

Social-sensor cloud service providers, Identity cloning detection, Non-privacy-sensitive user profile features, Deep learning, Incomplete user profile data

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025