Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
3 results
Search Results
Item Restricted Unsupervised Semantic Change Detection in Arabic(Queen Mary University of London, 2023-10-23) Sindi, Kenan; Dubossarsky, HaimThis study employs pretrained BERT models— AraBERT, CAMeLBERT (CA), and CAMeLBERT (MSA)—to investigate semantic change in Arabic across distinct time periods. Analyzing word embeddings and cosine distance scores reveals variations in capturing semantic shifts. The research highlights the significance of training data quality and diversity, while acknowledging limitations in data scope. The project's outcome—a list of most stable and changed words—contributes to Arabic NLP by shedding light on semantic change detection, suggesting potential model selection strategies and areas for future exploration.89 0Item Restricted Crisis Detection from Arabic Social Media(University of Birmingham, 2023-09-12) Alharbi, Alaa; Lee, MarkSocial media (SM) streams such as Twitter provide large quantities of real-time information about emergency events from which valuable information can be extracted to enhance situational awareness and support humanitarian response efforts. The timely extraction of crisis-related SM messages is challenging as it involves processing large quantities of noisy data in real time. Supervised machine learning classifiers are challenged by out-of-distribution learning when classifying unseen (new) crises due to data variations across events. Besides that, it is impractical to label training data from each novel and emerging crisis since obtaining sufficient labelled data is time-consuming and labour-intensive. This thesis addresses the problem of Twitter crisis classification using supervised learning methods to identify crisis-related data and categorising them into different information types in the multi-source (training data from multiple events) setting. Due to Twitter’s ubiquity during emergency events in the Arab world, the current research focuses on Arabic Twitter content. We have created and published a large-scale Arabic Twitter corpus of crisis events. The corpus has been analysed and manually labelled. Analysing the content includes investigating the main information categories of conversations posted during a range of crisis events using natural language processing techniques. Building these resources is considered one of this thesis’s contributions. The thesis also investigates the generalisation performance of different supervised classical machine learning and deep learning approaches trained on out-of-crisis data to classify unseen crises. We find that deep neural networks such as LSTM and CNN outperform the classical machine learning classifiers such as support vector machines and decision trees. We also evaluate different architectures of deep neural networks and several pre-trained text representations (embeddings) learnt from vast amounts of unlabelled text. Results show that BERT-based models are more robust to out-of-distribution target events and remarkably outperform other models on the information classification task. Experiments show that the performance of BERT-based classifiers can be enhanced when training on similar data. Thus, the last contribution of the present study is to propose an instance distance-based data selection approach for adaptation to improve classifiers’ performance under a domain shift. Using the BERT embeddings, the method selects a subset of multi-event training data that is most similar to the target event. Results show that fine-tuning a BERT model on a selected subset of data to classify crisis tweets outperforms a model that has been fine-tuned on all available source data.15 0Item Restricted Religious Hatred in Arabic Social Media: Analysis, Detection, and Personalization(2023-05) Albadi, Nuha; Mishra, ShivakantMiddle Eastern societies have long suffered from civil wars and domestic tensions that are partly caused by conflicting religious beliefs. This thesis examines the extent of religious hate in Arabic social media, evaluates the impact of automated accounts (i.e., bots) and personalized recommendation algorithms on its spread, and investigates social computing methods for automatically recognizing Arabic-language content and bots promoting religious hatred. First, the thesis addresses the scarcity of Arabic resources in the field by creating two publicly available, annotated Arabic datasets for Twitter and YouTube through crowdsourcing. It then presents a comprehensive analysis highlighting the prevalence of religious hatred on Arabic social networks, the most targeted religious groups, the unique characteristics of perpetrators, and the distinctions between Twitter and YouTube in terms of hate speech volume and targeted groups. Based on gathered insights, it then develops and evaluates several supervised machine learning models to automatically and efficiently detect hateful content. This thesis also contributes new insights into the role of Arabic-language bots in spreading religious hatred on Twitter and introduces a novel regression model tailored to detect Arabic-tweeting bots. Finally, the thesis audits YouTube’s recommendation algorithm to assess the effect of personalization based on demographics and watch history on the extent of hateful content recommended to users. The research presented in this thesis offers practical implications for platform designers to facilitate enforcing their policy against hate and malicious automation and contributes to the broader effort to combat online radicalization.32 0