SACM - United Kingdom
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667
Browse
9 results
Search Results
Item Restricted Embracing Emojis in Sarcasm Detection to Enhance Sentiment Analysis(University of Southampton, 2025) Alsabban, Malak Abdullah; Hall, Wendy; Weal, MarkPeople frequently share their ideas, concerns, and emotions on social networks, making sentiment analysis on social media increasingly important for understanding public opinion and user sentiment. Sentiment analysis provides an effective means of interpreting people's attitudes towards various topics, individuals, or ideas. This thesis introduces the creation of an Emoji Dictionary (ED) to harness the rich contextual information conveyed by emojis. It acts as a valuable resource for deciphering the emotional nuances embedded in textual content, contributing to a deeper understanding of sentiment. In addition, the research explores the complex domain of sarcasm detection by proposing a novel Sarcasm Detection Approach (SDA). This approach identifies sarcasm by analysing conflicts between textual content and the accompanying emojis. The thesis addresses key challenges in sentiment analysis by evaluating and comparing emoji dictionaries and sarcasm detection approaches to enhance sentiment classification. Extensive experimentation on diverse datasets rigorously assesses the effectiveness of these methods in improving sentiment analysis accuracy and sarcasm detection performance, particularly in emoji-rich datasets. The findings highlight the crucial role of emojis as contextual cues, underscoring their value in sentiment analysis and sarcasm detection tasks. The outcomes of this thesis aim to advance sentiment analysis methodologies by offering insights into preprocessing strategies, leveraging the expressive potential of emojis through the Emoji Dictionary (ED), and introducing the Sarcasm Detection Approach (SDA). The research demonstrates that integrating emojis through these tools substantially enhances both sentiment analysis and sarcasm detection. By utilizing these tools, the study not only improves model performance but also opens avenues for further exploration into the nuanced complexities of digital communication.19 0Item Restricted IS THE METAVERSEFAILING? ANALYSINGSENTIMENTS TOWARDSTHEMETAVERSE(The University of Manchester, 2024) Alharbi, Manal Dowaihi; Batista-navarro, RizaThis dissertation investigates Aspect-Based Sentiment Analysis (ABSA) within the context of the Metaverse to better understand opinions on this emerging digital environment, particularly from a news perspective. The Metaverse, a virtual space where users can engage in various experiences, has attracted both positive and negative opinions, making it crucial to explore these sentiments to gain insights into public perspectives. A novel dataset of news articles related to the Metaverse was created, and Target Aspect-Sentiment Detection (TASD) models were applied to analyze sentiments ex pressed toward various aspects of the Metaverse, such as device performance and user privacy. A key contribution of this research is the evaluation of the TASD architecture, TAS-BERT, and its enhanced version, Advanced TAS-BERT (ATAS-BERT), which performs each task separately, on two datasets: the newly created Metaverse dataset and the SemEval15 Restaurant dataset. They were tested with different Transformer based models, including BERT, DeBERTa, RoBERTa, and ALBERT, to assess performance, particularly in cases where the target is implicit. The findings demonstrate the ability of advanced Transformer models to handle complex tasks, even when the target is implicit. ALBERT performed well on the simpler Metaverse dataset, while DeBERTa and RoBERTa showed superior performance on both datasets. This dissertation also suggests several areas for improvement in future research, such as processing paragraphs instead of individual sentences, utilizing Meta AI models for dataset annotation to enhance accuracy, and designing architectures specifically for models like DeBERTa, RoBERTa, and ALBERT, rather than relying on architectures originally designed for BERT, to improve performance. Additionally, incorporating enriched context representations, such as Part-of-Speech tags, could further enhance model performance.11 0Item Restricted Evaluating CAMeL-BERT for Sentiment Analysis of Customer Satisfaction with STC (Saudi Telecom Company) Services(The University of Sussex, 2024-08-15) Alotaibi, Fahad; Pay, JackIn the age of informatics platforms such as Twitter (X) plays a crucial role for measuring public sentiment, especially in both private and public sectors. This study explores the application of machine learning, particularly deep learning, to perform sentiment analysis on tweets about Saudi Telecom Company (STC) services in Saudi Arabia. A comparative analysis was conducted between pre-trained sentiment analysis models in English and in Arabic to assess their effectiveness in classifying sentiments. In addition, the study highlights a challenge in existing Arabic models, which are based on English model architectures but trained on varied datasets, such as Modern Standard Arabic and Classical Arabic (Al-Fus’ha). These models often lack the capability to handle the diverse Arabic dialects commonly used on social media. To overcome this issue, the study involved fine-tuning a pre-trained Arabic model using a dataset of tweets related to STC services, specifically focusing on the Saudi dialect. Data was collected from Twitter (X), focusing on mentions of the Saudi Telecom Company (STC). Both English and Arabic models were applied to this data, and their performance in sentiment analysis was evaluated. The fine-tuned Arabic model (CAMeL-BERT) demonstrated improved accuracy and a better understanding of local dialects compared to its initial version. The results highlight the importance of model adaptation for specific languages and contexts and underline the potential of CAMeL-BERT in sentiment analysis for Arabic-language content. The findings offer practical implications for enhancing customer service and engagement through more accurate sentiment analysis of social media content in the service providers sector.16 0Item Restricted Semantic Analysis of Amazon Reviews of Sustainable Products(University of Leeds, 2024-02-18) Alotaibi, Amal; Dimitrova, VaniaOnline shopping has grown to be an essential part of modern living, garnering a wealth of client input. This project advances the field of consumer feedback mining and semantic and sentiment analysis of customer reviews since, when applied effectively, it can enhance goods, services, or marketing initiatives. This project proposes a framework using Natural Language Processing (NLP) techniques to find customer preferences related to sustainability through mining customer reviews (CR) text. First, implement the LDA and sLDA models using the Gensim package in Python to extract sustainable topics from CR. After that, implement the BERTopic model to find the sustainability aspect in (CR). Then, the overall sentiment for every review in each topic was calculated using the Vader sentiment library in Python. Lastly, interpret the results and generate helpful insights for brand managers. The Amazon product review data is used in this study, and we use Food and Grocery Sustainable Products. The findings of the proposed framework are promising, as we were able to identify the most discussed topics in sustainability aspects of products and produce an assessment that provides information about the aspects that the customers are most satisfied with and that can be improved. However, the sLDA model and the BERTopic model achieve the goal but not the expectation. especially BERTopic, it was not accurate enough for weakly supervised text classification. Also, the Vader sentiment tool did not meet expectations because of the complexity of CR. However, the text analyst specialist found that the structure is flexible enough to allow for future development and increased usage. Ultimately, we think that these data will help brand managers create and improve future products, which will raise consumer satisfaction and boost revenue and profitability.25 0Item Restricted Towards Numerical Reasoning in Machine Reading Comprehension(Imperial College London, 2024-02-01) Al-Negheimish, Hadeel; Russo, Alessandra; Madhyastha, PranavaAnswering questions about a specific context often requires integrating multiple pieces of information and reasoning about them to arrive at the intended answer. Reasoning in natural language for machine reading comprehension (MRC) remains a significant challenge. In this thesis, we focus on numerical reasoning tasks. As opposed to current black-box approaches that provide little evidence of their reasoning process, we propose a novel approach that facilitates interpretable and verifiable reasoning by using Reasoning Templates for question decomposition. Our evaluations hinted at the existence of problematic behaviour in numerical reasoning models, underscoring the need for a better understanding of their capabilities. We conduct, as a second contribution of this thesis, a controlled study to assess how well current models understand questions and to what extent such models are basing their answers on textual evidence. Our findings indicate that applying transformations that obscure or destroy the syntactic and semantic properties of the questions does not change the output of the top-performing models. This behaviour reveals serious holes in how the models work. It calls into question evaluation paradigms that only use standard quantitative measures such as accuracy and F1 scores, as they lead to a false illusion of progress. To improve the reliability of numerical reasoning models in MRC, we propose and demonstrate, as our third contribution, the effectiveness of a solution to one of these fundamental problems: catastrophic insensitivity to word order. We do this by FORCED INVALIDATION: training the model to flag samples that cannot be reliably answered. We show it is highly effective at preserving word order importance in machine reading comprehension tasks and generalises well to other natural language understanding tasks. While our Reasoning Templates are competitive with the state-of-the-art on a single type, engineering them incurs a considerable overhead. Leveraging our better insights on natural language understanding and concurrent advancements in few-shot learning, we conduct a first investigation to overcome scalability limitations. Our fourth contribution combines large language models for question decomposition with symbolic rule learning for answer recomposition, we surpass our previous results on Subtraction questions and generalise to more reasoning types.14 0Item Restricted Exploring Emoji Sentiment Roles in Arabic Textual Content on Digital Social Networks(Saudi Digital Library, 2024-07-09) Hakami, Shatha Ali A; Hendley, Robert; Smith, PhillipIn today’s digital landscape, emoji have risen as pivotal elements in articulating sentiment, especially within the intricacies of the Arabic language. This thesis examines the various roles that emoji can play in expressing sentiment in Arabic texts, highlighting their relevance both in academic and real-world contexts. Beginning with foundational insights, our investigation retraces the history of emoji as important non-verbal communicative tools in human interaction. Then, we explore the distinct challenges of sentiment analysis in Arabic and refer to a thorough review of previous studies to frame our method, identifying both established techniques and unexplored opportunities. At the heart of our research is the understanding that, depending on the context, an emoji can adopt a wide variety of sentiment roles. These range from acting as an indicator, mitigator, emphasizer, reverser, releaser, or trigger of either negative or positive sentiment. Additionally, there are instances where an emoji simply maintains a neutral effect on the sentiment of the accompanying text. To achieve this, we gathered a large dataset, mainly from Twitter, and developed lexicons of words and emoji tailored for sentiment analysis in Arabic. These lexicons were the basis of our analysis model. By leveraging the insights gained from the emoji-roles sentiment lexicon and combining them with our established knowledge of the sentiment roles associated with specific emoji patterns, we make a significant improvement in the conventional sentiment classifier based on the emoji lexicon. Traditional methods often assign a static sentiment score to an emoji, failing to consider its varying roles in different textual contexts. Our refined approach corrects this oversight. Instead of considering a singular unchanging sentiment score for each emoji, the classifier dynamically retrieves sentiment scores based on the specific role the emoji plays within a given sentence. In conclusion, we compare our method with other Arabic sentiment analysis tools, demonstrating the value of our approach, especially within nuanced linguistic phenomena such as sarcasm and humour. This thesis sets the foundation for future Arabic research in this expanding domain.53 0Item Restricted The Dance Of Order And Chaos: Tracking Keywords Evolution in a Community Over Time(Saudi Digital Library, 2023) Alhazmi, Arwa; Andreas, Gutmann; Ismini, Psychoula; Murdoch, StevenOnline platforms face a persistent challenge in managing prohibited content. As they act to curtail undesired content by blocking search results for specific search terms (keywords) used by malicious actors, they inadvertently impact their associated benign content. Simultaneously, malicious actors cleverly adapt by introducing intentional language variations to those terms. This risks blocking further innocent content and creates openings for undesired content to remain undetected. Therefore, while bad actors’ use of specialized language offers opportunities for content management, it also raises the need for systems adept at detecting the specific terms used across different timeframes, to thwart their efforts efficiently. In this research, we utilize a publicly available time-series dataset of online posts (news articles) to track keyword evolution over time. We posit that methods adept at capturing these shifts can enhance analysis and consequently, the precision of search terms blocking. Our methods leveraged diverse NLP techniques. Firstly, to track change in keywords, news articles were categorized using BERTopic, keywords were extracted for each article using KeyBERT, and afterward, keywords were sampled and carefully represented utilizing tf-idf for different periods. Subsequently, periods were clustered using hierarchical agglomerative clustering to identify patterns and trends. Secondly, our method for tracking contextual change in keywords consisted of identifying keywords, identifying different topics keywords’ representative articles belong to, and setting criteria for defining prominent and shifting topics. Our analysis has yielded promising results, demonstrating that the clustering approach we have adopted for tracking change is adept for handling time-series keywords. Its strength lies in discerning evolving patterns and temporal shifts in keywords and providing insights into ideal time frames for such monitoring. Notably, we were able to identify recurring or seasonal trends, shortterm trends, extended trends, and distinctive keywords isolated within a single month. Moreover, our method and criteria for tracking and analyzing keywords’ usage evolution between through different contexts have proven effective, as evidenced by identifying a contextual shift in 16% of the top 1,000 keywords in our dataset.17 0Item Restricted Keyword Kaleidoscope: Identifying the difference in keywords predominantly used within one community via contrasting with another community(Saudi Digital Library, 2023) Alhazmi, Alaa; Gutmann, Andreas; Murdoch, Steven; Psychoula, IsminiOnline platforms seek to combat unwanted activities and content by implementing measures to block search terms associated with specific keywords frequently used by malicious actors. However, a persistent challenge arises as this approach may inadvertently affect legitimate content that shares these keywords. This study aims to utilize publicly available datasets of online posts to identify differences in the most prominent keywords in these datasets. The goal is to obtain such distinctions by applying similar methods in harmful and benign communities that share similar language and, consequently, employ them toward more effective search term-blocking. To this end, we employed several analysis methods. Keyword frequencies were computed and compared tabularly, visually, and through hypothesis tests. Topic modeling was applied to the reviews from the datasets to examine the keywords within similar topics and their frequencies. Keyword co-occurrences, delineated by how frequently keywords appeared in the same review as each other, were also tallied, and keywords with the top co-occurrence differences were further explored through plots and representative reviews. While this study centered on two reviewer communities, we have discovered several overarching insights, specifically a similar process could be implemented to guide and aid the process of effective banning in search functionalities. The two datasets examined were found to be speaking about similar concepts. While the ordering of the top keywords shifts between the two, the majority of the most frequent keywords are found near the top of both lists. Despite these similarities, however, differences in the overall frequencies of overlapping keywords existed. Notable dissimilarities between the two communities were discovered either as keywords missing from one top list or the other, or in frequency through Pearson’s chi-squared contingency test. The topic model results showed that some topics were present in both communities but were linked to different keywords in each. Finally, the keyword-keyword co-occurrence analysis in this work indicates that even keywords used commonly by both communities can have alternate associations.13 0Item Restricted Arabic Short Texts Authorship Verification(Saudi Digital Library, 2023-11-07) Alqahtani, Fatimah; Yannakoudakis, HelenAuthorship verification is the process of determining whether or not two pieces of writing are written by the same author by comparing their writing styles. Technically, it is a branch of the authorship analysis problem, and is considered to be a text classification task that results in (Yes or No) binary output. Despite the widespread usage of Twitter in the Arab world, short text research has so far focused on authorship verification in languages other than Arabic, such as English, Spanish, and Greek. Arabic, with its complex morphology, lack of capitalisation, and short vowels, presents unique linguistic challenges to verifying authorship. This thesis seeks to address that issue by applying different machine learning and deep learning techniques with focusing on extracting the most effective features to solve the problem of authorship verification for Arabic short writing. Due to the lack of publicly available data for this task, an Arabic Twitter corpus was compiled for 100 users, with a minimum of 1,000 tweets and a maximum of 3,000 tweets per user. Different features were used in order to investigate the most predictive features for authorship verification of Arabic short texts (specifically the tweets). This study explores the impacts of using different textual features, such as stylometric features, Term Frequency-Inverse Document Frequency (TF-IDF), Bag Of Words (BOW), and n-gram. A novel Arabic knowledge-base model (AraKB) was created to enhance the authorship verification of the challenging Arabic short texts that yielded promising results. In addition, different deep learning techniques were tested to identify their impact to verify authorship. Long Short-Term Model (LSTM) and Arabic Bidirectional Encoder Representations from Transformers (AraBERT) were applied separately, and resulted in different performance outcomes. In addition, an analytical analysis was done to see how meta-data from Twitter’s postings, such as time and device source, can help to verify users better. The experiments were conducted using different machine learning algorithms which are Gradient Boosting, Random Forest, Support Vector Machine, and k-Nearest Neighbour. The performance was measured using the most commonly used metrics for authorship analysis tasks, which are accuracy, precision, recall, and F1 score. The results provide evidence of the importance of choosing the right features based on the given texts, and indicate that no feature can be generalised to all types of data. To the best of the researcher’s knowledge, no study has been conducted on verifying Arabic social media texts. This study suggests that the ability to verify users on social media platforms provides solutions to different forensics and safety issues, and aids in the prevention of using fake identities to practice fraud, bullying, terrorism, and violence. This research is significant on the subject of digital forensics investigation and cyber safety.29 0