SACM - United Kingdom

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667

Browse

Search Results

Now showing 1 - 10 of 12

Restricted
Analysing and Visualising (Cyber)crime data using Structured Occurrence Nets and Natural Language Processing
(Newcastle University, 2025-03-01) Alshammari, Tuwailaa; Koutny, Maciej
Structured Occurrence Nets (SONs) are a Petri net-based formalism designed to represent the behaviour of complex evolving systems, capturing concurrent events and interactions between subsystems. Recently, the modelling and visualisation of crime and cybercrime investigations have gained increasing interest. In particular, SONs have proven to be versatile tools for modelling and visualising various applications, including crime and cybercrime. This thesis presents two contributions aimed at making SON-based techniques suitable for real-life applications. The main contribution is motivated by the fact that manually developing SON models from unstructured text can be time-consuming, as it requires extensive reading, comprehension, and model construction. This thesis aims to develop a methodology for the formal representation of unstructured textual resources in English. This involves experimenting, mapping, and deriving relationships between natural and formal languages, specifically using SON for crime modelling and visualisation as an application. The second contribution addresses the scalability of SON-based representations for cybercrime analysis. It provides a novel approach in which acyclic nets have been extended with coloured features to enable reduction of net size to help in visualisation. While the two contributions address distinct challenges, they are unified by their use of SONs as a formalism to model complex systems. Structured occurrence nets demonstrated their adaptability in representing both crime scenarios and cybercrime activities.
38 0
Restricted
Analysing and Visualising (Cyber)crime data using Structured Occurrence Nets and Natural Language Processing
(2025) Tuwailaa Alshammari; Professor Maciej Koutny
Structured Occurrence Nets (SONs) are a Petri net-based formalism designed to represent the behaviour of complex evolving systems, capturing concurrent events and interactions between subsystems. Recently, the modelling and visualisation of crime and cybercrime investigations have gained increasing interest. In particular, SONs have proven to be versatile tools for modelling and visualising various applications, including crime and cybercrime. This thesis presents two contributions aimed at making SON-based techniques suitable for real-life applications. The main contribution is motivated by the fact that manually developing SON models from unstructured text can be time-consuming, as it requires extensive reading, comprehension, and model construction. This thesis aims to develop a methodology for the formal representation of unstructured textual resources in English. This involves experimenting, mapping, and deriving relationships between natural and formal languages, specifically using SON for crime modelling and visualisation as an application. The second contribution addresses the scalability of SON-based representations for cybercrime analysis. It provides a novel approach in which acyclic nets have been extended with coloured features to enable reduction of net size to help in visualisation. While the two contributions address distinct challenges, they are unified by their use of SONs as a formalism to model complex systems. Structured occurrence nets demonstrated their adaptability in representing both crime scenarios and cybercrime activities.
48 0
Restricted
Enhancing Biomedical Named Entity Recognition through Multi-Task Learning and Syntactic Feature Integration with BioBERT
(De Montfort University, 2024-08) Alqulayti, Abdulaziz; Taherkhani, Aboozar
Biomedical Named Entity Recognition (BioNER) is a critical task in natural language processing (NLP) for pulling noteworthy knowledge from the frequently growing size of biomedical literature. The concentrate of this study is creating refined BioNER models, which identify entities like proteins, diseases, and genes with remarkable generalizability and accuracy. Important challenges in BioNER are handled in the study, such as morphological variations, the complex nature of biomedical terminology, the vagueness usually seen in context-dependent language and morphological variations. This study establishes a unique standard in BioNER methodology, it incorporates cutting-edge machine learning techniques like character-level embeddings through Bidirectional Long Short-Term Memory (BiLSTM) networks, pre-trained models like BioBERT, multi-task learning solution, and syntactic feature extraction. The NCBI Disease Corpus, a standard dataset for disease name recognition, was used to apply the methodology to it. Two main models were created The BioBERTForNER and BioBERTBiLSTMForNER. The BioBERTBiLSTM model contains an additional BiLSTM layer, which showed exceptional performance by catching long-term dependencies and complicated morphological patterns in biomedical text. An exceptional 0.938 F1-score has been reached with This model beating existing advanced systems and the baseline BioBERT model. Also, the study investigates the effect of syntactic features and character-level embeddings, demonstrating their vital part in improving recall and precision. The combination of a multi-task learning solution demonstrated quite adequate at moderating the model’s capacity to maintain generalize across different contexts and overfitting. The final models not solely formed further measures on the NCBI Disease Corpus they also presented a multi-faceted strategy and expandable to BioNER, which shows how architectural innovations and refined embedding methods can greatly enhance biomedical text mining. The study results underscore the key part of progressive embedding techniques and multi-task learning in NLP, displaying their flexibility across various biomedical domains. Additionally, this study displays the possibility for these improvements to be used in analysis and real-world clinical data extraction preparing the path for forthcoming studies. Additional mixed biomedical datasets could be used to extend These methodologies, which eventually improve the efficiency and precision of automated biomedical information retrieval in clinical settings.
7 0
Restricted
A Quality Model to Assess Airport Services Using Machine Learning and Natural Language Processing
(Cranfield University, 2024-04) Homaid, Mohammed; Moulitsas, Irene
In the dynamic environment of passenger experiences, precisely evaluating passenger satisfaction remains crucial. This thesis is dedicated to the analysis of Airport Service Quality (ASQ) by analysing passenger reviews through sentiment analysis. The research aims to investigate and propose a novel model for assessing ASQ through the application of Machine Learning (ML) and Natural Language Processing (NLP) techniques. It utilises a comprehensive dataset sourced from Skytrax, incorporating both text reviews and numerical ratings. The initial analysis presents challenges for traditional and general NLP techniques when applied to specific domains, such as ASQ, due to limitations like general lexicon dictionaries and pre-compiled stopword lists. To overcome these challenges, a domain-specific sentiment lexicon for airport service reviews is created using the Pointwise Mutual Information (PMI) scoring method. This approach involved replacing the default VADER sentiment scores with those derived from the newly developed lexicon. The outcomes demonstrate that this specialised lexicon for the airport review domain substantially exceeds the benchmarks, delivering consistent and significant enhancements. Moreover, six unique methods for identifying stopwords within the Skytrax review dataset are developed. The research reveals that employing dynamic methods for stopword removal markedly improves the performance of sentiment classification. Deep learning (DL), especially using transformer models, has revolutionised the processing of textual data, achieving unprecedented success. Therefore, novel models are developed through the meticulous development and fine-tuning of advanced deep learning models, specifically Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT), tailored for the airport services domain. The results demonstrate superior performance, highlighting the BERT model's exceptional ability to seamlessly blend textual and numerical data. This progress marks a significant improvement upon the current state-of-the-art achievements documented in the existing literature. To encapsulate, this thesis presents a thorough exploration of sentiment analysis, ML and DL methodologies, establishing a framework for the enhancement of ASQ evaluation through detailed analysis of passenger feedback.
21 0
Restricted
Enhancing Biomedical Named Entity Recognition through Multi-Task Learning and Syntactic Feature Integration with BioBERT
(De Montfort University, 2024-08) Alqulayti, Abdulaziz; Taherkhani, Aboozar
Biomedical Named Entity Recognition (BioNER) is a critical task in natural language processing (NLP) for pulling noteworthy knowledge from the frequently growing size of biomedical literature. The concentrate of this study is creating refined BioNER models, which identify entities like proteins, diseases, and genes with remarkable generalizability and accuracy. Important challenges in BioNER are handled in the study, such as morphological variations, the complex nature of biomedical terminology, the vagueness usually seen in context-dependent language and morphological variations. This study establishes a unique standard in BioNER methodology, it incorporates cutting-edge machine learning techniques like character-level embeddings through Bidirectional Long Short-Term Memory (BiLSTM) networks, pre-trained models like BioBERT, multi-task learning solution, and syntactic feature extraction. The NCBI Disease Corpus, a standard dataset for disease name recognition, was used to apply the methodology to it. Two main models were created The BioBERTForNER and BioBERTBiLSTMForNER. The BioBERTBiLSTM model contains an additional BiLSTM layer, which showed exceptional performance by catching long-term dependencies and complicated morphological patterns in biomedical text. An exceptional 0.938 F1-score has been reached with This model beating existing advanced systems and the baseline BioBERT model. Also, the study investigates the effect of syntactic features and character-level embeddings, demonstrating their vital part in improving recall and precision. The combination of a multi-task learning solution demonstrated quite adequate at moderating the model’s capacity to maintain generalize across different contexts and overfitting. The final models not solely formed further measures on the NCBI Disease Corpus they also presented a multi-faceted strategy and expandable to BioNER, which shows how architectural innovations and refined embedding methods can greatly enhance biomedical text mining. The study results underscore the key part of progressive embedding techniques and multi-task learning in NLP, displaying their flexibility across various biomedical domains. Additionally, this study displays the possibility for these improvements to be used in analysis and real-world clinical data extraction preparing the path for forthcoming studies. Additional mixed biomedical datasets could be used to extend These methodologies, which eventually improve the efficiency and precision of automated biomedical information retrieval in clinical settings.
25 0
Restricted
Adapting to Change: The Temporal Performance of Text Classifiers in the Context of Temporally Evolving Data
(Queen Mary University of London, 2024-07-08) Alkhalifa, Rabab; Zubiaga, Arkaitz
This thesis delves into the evolving landscape of NLP, particularly focusing on the temporal persistence of text classifiers amid the dynamic nature of language use. The primary objective is to understand how changes in language patterns over time impact the performance of text classification models and to develop methodologies for maintaining their effectiveness. The research begins by establishing a theoretical foundation for text classification and temporal data analysis, highlighting the challenges posed by the evolving use of language and its implications for NLP models. A detailed exploration of various datasets, including the stance detection and sentiment analysis datasets, sets the stage for examining these dynamics. The characteristics of the datasets, such as linguistic variations and temporal vocabulary growth, are carefully examined to understand their influence on the performance of the text classifier. A series of experiments are conducted to evaluate the performance of text classifiers across different temporal scenarios. The findings reveal a general trend of performance degradation over time, emphasizing the need for classifiers that can adapt to linguistic changes. The experiments assess models' ability to estimate past and future performance based on their current efficacy and linguistic dataset characteristics, leading to valuable insights into the factors influencing model longevity. Innovative solutions are proposed to address the observed performance decline and adapt to temporal changes in language use over time. These include incorporating temporal information into word embeddings and comparing various methods across temporal gaps. The Incremental Temporal Alignment (ITA) method emerges as a significant contributor to enhancing classifier performance in same-period experiments, although it faces challenges in maintaining effectiveness over longer temporal gaps. Furthermore, the exploration of machine learning and statistical methods highlights their potential to maintain classifier accuracy in the face of longitudinally evolving data. The thesis culminates in a shared task evaluation, where participant-submitted models are compared against baseline models to assess their classifiers' temporal persistence. This comparison provides a comprehensive understanding of the short-term, long-term, and overall persistence of their models, providing valuable information to the field. The research identifies several future directions, including interdisciplinary approaches that integrate linguistics and sociology, tracking textual shifts on online platforms, extending the analysis to other classification tasks, and investigating the ethical implications of evolving language in NLP applications. This thesis contributes to the NLP field by highlighting the importance of evaluating text classifiers' temporal persistence and offering methodologies to enhance their sustainability in dynamically evolving language environments. The findings and proposed approaches pave the way for future research, aiming at the development of more robust, reliable, and temporally persistent text classification models.
20 0
Restricted
EXTRACTION OF TEMPORAL RELATIONSHIPS BETWEEN EVENTS FROM NEWS ARTICLES FOR TIMELINE GENERATION
(University of Manchester, 0024-06-27) Alsayyahi, Sarah; Batista- Navarro, Riza
Extracting temporal information from natural language texts is crucial for understanding the sequence and context of events, enhancing the accuracy of timeline generation and event analysis in various applications. However, within the NLP community, determining the temporal ordering of events has been recognised as a challenging task. This difficulty arises from the inherent vagueness of temporal information found in natural language texts like news articles. In Temporal Information Extraction (TIE), different datasets and methods have been proposed to extract various types of temporal entities, including events, temporal expressions, temporal relations, and the relative order of events. Some of these tasks have been considered easier than others in the field. For instance, extracting the temporal expressions or events is easier than determining the optimal order of a set of events. The complexity of determining the event order arises due to the requirement of commonsense and external knowledge, which is not readily accessible to computers. In contrast, humans can effortlessly identify this chronological order by relying on their external knowledge and understanding to establish the most appropriate sequence. In this thesis, our goal was to improve the performance of state-of-the-art methods for determining the temporal order of events in news articles. Accordingly, we present the following contributions: 1. We reviewed the literature by conducting a systematic survey, categorising tasks and datasets relevant to extracting the order of events mentioned in the news articles. We also identified existing findings and highlighted some research directions worth further investigation. 2. We proposed a novel annotation scheme with an unambiguous definition of the types of events and temporal relations of interest. Adopting this scheme, we developed a TIMELINE dataset, which annotates both verb and nominal events and considers the long-distance temporal relations between events separated by more than one sentence. 3. We integrated problem-related features with a neural-based method to improve the model's ability to extract temporal relations that involved nominal events and the temporal relations with small classes (e.g., EQUAL class). We found that integrating these features has significantly improved the performance of the neural baseline model and could achieve state-of-the-art results in two datasets in the literature. 4. We proposed a framework that uses local search algorithms (e.g., Hill Climbing and Simulated Annealing) to generate document-level timelines from a set of temporal relations. These algorithms have improved the performance of the current models and resolved the problem in less time than the state-of-the-art models.
36 0
Restricted
Predicting Actions in Images using Distributed Lexical Representations
(University of Sheffield, 2023-08-05) Alsunaidi, Abdulsalam; Gaizauskas, Rob
Artificial intelligence has long sought to develop agents capable of perceiving the complex visual environment around us and communicating about it using natural language. In recent years, significant strides have been made towards this objective, particularly in the field of image content description. For instance, current artificial systems are able to classify images of a single object with a high level of accuracy that is sometimes comparable to that of humans. Although there has been remarkable progress in recognising objects, there has been less headway in action recognition due to a significant limitation in the current approach. Most of the advances in visual recognition rely on classifying images into distinct and non-overlapping categories. While this approach may work well in many contexts, it is inadequate for under- standing actions. It constrains the categorisation of an action to a single interpretation, thereby preventing an agent from proposing multiple possible interpretations. To tackle this fundamental limitation, this thesis proposes a framework that seeks to de- scribe action-depicting images using multiple verbs, and expands the vocabulary used to de- scribe such images beyond the limitations of the training dataset. In particular, the framework leverages lexical embeddings as a supplementary tool to go beyond the verbs that are supplied as explicit labels for images in datasets used for supervised training of action classifiers. More specifically, these embeddings are used for representing the target labels (i.e., verbs). By exploiting a richer representations of human actions, this framework has the potential to improve the capability of artificial agents to accurately recognise and describe human actions in images. In this thesis, we focus on the representation of input images and target labels. We examine various components for both elements, ranging from commonly used off-the-shelf options to custom-designed ones tailored to the task at hand. By carefully selecting and evaluating these components, we aim not only to improve the accuracy and effectiveness of the proposed frame- work but also to gain deeper insight into the potential of distributed lexical representations for action prediction in images.
12 0
Restricted
Unsupervised Semantic Change Detection in Arabic
(Queen Mary University of London, 2023-10-23) Sindi, Kenan; Dubossarsky, Haim
This study employs pretrained BERT models— AraBERT, CAMeLBERT (CA), and CAMeLBERT (MSA)—to investigate semantic change in Arabic across distinct time periods. Analyzing word embeddings and cosine distance scores reveals variations in capturing semantic shifts. The research highlights the significance of training data quality and diversity, while acknowledging limitations in data scope. The project's outcome—a list of most stable and changed words—contributes to Arabic NLP by shedding light on semantic change detection, suggesting potential model selection strategies and areas for future exploration.
98 0
Restricted
Medical Screening Assistant: A Chatbot to Help Nurses
(Saudi Digital Library, 2023-11-08) Al Rabeyah, Abdullah Saleh; Da Silva, Rogerio E; Goes, Fabricio
Over the last several years, Machine Learning has emerged as a key player in the healthcare industry. The use of chatbots is a notable application of artificial intelligence within the field of healthcare. The advent of the ChatGPT revolution represents a significant breakthrough in the realm of natural language processing, a fundamental aspect of chatbot programming. This development has simplified the implementation of GPT to engage in user communication and fulfill the objectives of the application. The objective of this project is to reduce the excessive workloads faced by healthcare professionals and enhance the efficiency of decision-making processes. This will be achieved via the development of an intelligent medical chatbot as a mobile application, specifically designed to support nurses in conducting early patient diagnoses by analyzing symptoms. The chatbot uses Swift programming language for the iOS front-end and Python with Flask for the backend. It incorporates the ChatGPT API and machine learning models to effectively comprehend and interpret user inquiries. This project uses a Kaggle dataset of 41 distinct diseases along with their corresponding symptoms. The model is trained using Logistic Regression to predict the prognosis. The responsibility of managing the dialogue between the user and the chatbot, leading up to the compilation of the definitive list of symptoms shown by the patient, lies with ChatGPT. The use of a Flask RESTful API facilitates direct interaction between the iOS application and the server-side infrastructure. Finally, the application will provide the nurse with the five most probable prognoses, along with the prediction confidence scores, depending on the symptoms supplied. Additionally, the application will offer a description of the disease and provide precautionary measures for the patient.
20 0

SACM - United Kingdom

Browse

Filters

Settings

Sort By

Results per page

Search Results