Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
1 results
Search Results
Item Restricted Enhancing Cross-lingual Transfer Learning for Crisis Text Classification on Social Media(Saudi Digital Library, 2025) AlAmer, Shareefa; Lee, Mark; Smith, PhillipDuring crisis events such as natural disasters, conflicts, and pandemics, social media platforms serve as vital channels for real-time information sharing. These platforms enable users to post urgent updates, request assistance, and disseminate situational awareness at a scale and speed that traditional communication systems cannot match. Automatically classifying user-generated content in these contexts is essential for supporting timely emergency response. However, performing this task across multiple languages remains a major challenge, especially given that such content is often noisy, informal, and linguistically diverse. One of the core challenges lies in the scarcity of annotated data that is both domain- and task-specific. Even widely spoken languages may lack labelled resources tailored to specific applications, effectively rendering them low-resource for those tasks. Existing solutions for cross-lingual transfer remain suboptimal even when applied to more structured and formal data using complex architectures, which further highlights their limitations when handling the noisier and less predictable nature of social media content. In response to these challenges, this thesis investigates practical and scalable solutions to improve the cross-lingual classification of crisis-related social media content. The research explores four key directions: (1) evaluating Machine Translation as a strategy for augmenting training data in low-resource languages; (3) applying ensemble learning to enhance robustness across multilingual inputs; (3) examining data balancing methods to mitigate class imbalance; and (4) analysing interlingual transfer dynamics to identify how languages interact in multilingual learning setups. The evaluation of the proposed approach is performed through extensive experimentation on a real-world dataset of crisis-related X-posts (formerly known as tweets). The proposed methods achieve competitive results despite challenges posed by noisy social media text, class imbalance, and the lack of annotated data. This work presents a generalisable framework for multilingual crisis classification and offers insights that are valuable for real-world applications where language diversity and data scarcity are critical factors. The effectiveness of the proposed system can be further enriched by incorporating a wider range of languages and leveraging more advanced analytical models. Additionally, adopting advanced translation techniques could also be explored for even greater impact in future crisis response systems.27 0
