Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
65 results
Search Results
Item Restricted Online conversations: A study of their toxicity(University of Illinois Urbana-Champaign, 2024) Alkhabaz, Ridha; Sundaram, HariSocial media platforms are essential spaces for modern human communication. There is a dire need to make these spaces most welcoming and engaging to their participants. A potential threat to this need is the propagation of toxic content in online spaces. Hence, it becomes crucial for social media platforms to detect early signs of a toxic conversation. In this work, we tackle the problem of toxicity prediction by proposing a definition for conversational structures. This definition empowers us to provide a new framework for toxicity prediction. Thus, we examine more than 1.18 million X (made by 4.4 million users), formerly known as Twitter, threads to provide a few key insights about the current state of online conversations. Our results indicated that most of the X threads do not exhibit a conversational structure. Also, our newly defined structures are distributed differently than previously thought of online conversations. Additionally, our definitions give a meaningful sign for models to start predicting the future toxicity of online conversations. We also showcase that message-passing graph neural networks outperform state-of-the-art gradient- boosting trees for toxicity prediction. Most importantly, we find that once we observe the first two terminating conversational structures, we can predict the future toxicity of online threads with ≈88 % accuracy. We hope our findings will help social media platforms better curate content in their spaces and promote more conversations in online spaces.15 0Item Restricted A Quality Model to Assess Airport Services Using Machine Learning and Natural Language Processing(Cranfield University, 2024-04) Homaid, Mohammed; Moulitsas, IreneIn the dynamic environment of passenger experiences, precisely evaluating passenger satisfaction remains crucial. This thesis is dedicated to the analysis of Airport Service Quality (ASQ) by analysing passenger reviews through sentiment analysis. The research aims to investigate and propose a novel model for assessing ASQ through the application of Machine Learning (ML) and Natural Language Processing (NLP) techniques. It utilises a comprehensive dataset sourced from Skytrax, incorporating both text reviews and numerical ratings. The initial analysis presents challenges for traditional and general NLP techniques when applied to specific domains, such as ASQ, due to limitations like general lexicon dictionaries and pre-compiled stopword lists. To overcome these challenges, a domain-specific sentiment lexicon for airport service reviews is created using the Pointwise Mutual Information (PMI) scoring method. This approach involved replacing the default VADER sentiment scores with those derived from the newly developed lexicon. The outcomes demonstrate that this specialised lexicon for the airport review domain substantially exceeds the benchmarks, delivering consistent and significant enhancements. Moreover, six unique methods for identifying stopwords within the Skytrax review dataset are developed. The research reveals that employing dynamic methods for stopword removal markedly improves the performance of sentiment classification. Deep learning (DL), especially using transformer models, has revolutionised the processing of textual data, achieving unprecedented success. Therefore, novel models are developed through the meticulous development and fine-tuning of advanced deep learning models, specifically Bidirectional Long Short-Term Memory (BiLSTM) and Bidirectional Encoder Representations from Transformers (BERT), tailored for the airport services domain. The results demonstrate superior performance, highlighting the BERT model's exceptional ability to seamlessly blend textual and numerical data. This progress marks a significant improvement upon the current state-of-the-art achievements documented in the existing literature. To encapsulate, this thesis presents a thorough exploration of sentiment analysis, ML and DL methodologies, establishing a framework for the enhancement of ASQ evaluation through detailed analysis of passenger feedback.9 0Item Restricted Feature Selection for High Dimensional Healthcare Data(University of Surrey, 2024-01) Alayed, Abdulrahman; Kouchaki, SamanehIn today’s digital landscape, researchers frequently encounter the complexity of handling highdimensional datasets. At times, data mining and machine learning methods struggle when confronted with immense datasets, leading to inefficiencies. The presence of extensive raw data with numerous features can negatively impact machine learning algorithms, affecting accuracy, increasing overfitting, and amplifying complexity. This is primarily due to the inclusion of redundant and irrelevant data, which hampers the learning process. However, employing feature selection techniques can effectively address these challenges. By selectively choosing relevant features, these techniques enable machine learning algorithms to operate more efficiently. They contribute to faster training, reduce model complexity, enhance accuracy, and mitigate overfitting issues. The primary objective of this project is to create an automatic variable selection pipeline by choosing the best features among various innovative feature selection techniques. The pipeline incorporates different categories of variable selection methods: Filter methods, Wrapper methods, Embedded methods, and Hybrid Method. The variable selection techniques are applied to the MIMIC-III (Medical Information Mart for Intensive Care) dataset, which is reachable at no cost. This database is well-suited for the project's goals, as it is a centralized database containing details about patients admitted to the critical care unit of a vast regional hospital. The dataset is particularly useful for forecasting the likelihood of death pst-ICU admission during hospital stay. To achieve this goal, the project employs six classification techniques: Logistic Regression (LR), K-nearest Neighbours (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). The project systematically evaluates and compares the model's performance using various assessment metrics.28 0Item Restricted Intelligent Maintenance for Chilled Water System at Commercial Buildings: A Holistic Approach in Line with Industry 4.0(University of Strathclyde, 2024-09) Almobarek, Malek Yousef; Mendibil, KepaCommercial buildings are equipped with critical systems that need strong attention by applying efficient maintenance practices. One of these systems is the chilled water system (CWS), which contains sophisticated components and consumes significantly higher levels of energy and financial resources compared to other systems. Given the relevance of the issue, this research study started with the following guiding research question: “What are the approaches or methods to implement predictive maintenance (PdM) or fault detection for a chilled water system at commercial buildings?” The review of the literature (with more than 180 studies analysed) identified several research gaps, which are (1) the impact of the technical correlation between CWS components on fault detection remains unknown, (2) there is a significant level of variations in defining CWS faults and their importance, (3) the data measurement of these faults is not standardised leading to unclear data collection practice, and (4) the resolution of these faults remains inconclusive. Accordingly, four research questions were generated. Two research methods were assigned to answer the generated four research questions: an industry survey and a case study. The industry survey adhered to construction guidelines and a pilot study. Subsequently, it was sent to 761 professionals of commercial buildings in the city of Riyadh, Kingdom of Saudi Arabia, out of which 304 responses were considered and analysed. For the second research method, a case study, a novel methodological framework has been developed and implemented. The framework contained three phases: set-up, machine learning and quality control. The first phase proposed arrangements to prepare the framework, while in the literature, studies were directly started with building the detection model. The second phase proposed a decision tree model to detect faults. The final phase suggested managerial steps for monitoring, controlling, and evaluating the maintenance framework which includes the detection model, while in the literature, studies were ended with presenting the model accuracy. In addition, a second case study has been conducted for external validity purposes. This research project has proposed an intelligent maintenance framework for the whole CWS components in line with Industry 4.0, which includes a fault detection model using machine learning. During three empirical periods, the research questions have been answered and verified, with the proposed detection model achieving greater than or equal to 20 per cent improvement in detecting faults at the two case study sites compared to the current building management system. This thesis makes significant theoretical contributions, which are adding and recording additional faults to the ones mentioned by the literature, providing an action to fix each fault, providing fault frequencies that can be used in data collection and machine learning, and confirming the technical relevance between CWS components. Practically, this thesis makes significant contributions by proposing the said methodological framework, which contains an intelligent detection model. The framework inherently led to three other contributions, which are providing a simplified schematic for CWS, providing a proper location for each reading tool for data collection purpose, and providing a control plan for continuous monitoring for CWS. The aforementioned theoretical and practical contributions give a strong value for this research as they delivered a holistic maintenance guide for CWS at commercial buildings. At the end of this thesis, several areas for future research are suggested as well as the author’s own reflection is shared.211 0Item Restricted ECG CLASSIFICATION USING NEURAL NETWORK(University of Bridgeport, 2018) Alhassani, Ahmad; Faezipour, MiadAn electrocardiogram (ECG) is one of the biomedical signals that is considered a very useful approach to providing information about heart problems. This thesis has been done to contribute to making machines of observation of hearts have more ability for making accurate and fast diagnosis so that life of more patients might be saved. Physios Bank was the source of our dataset. It has many real examples of heart diseases that we can choose for our studies. In this research, there are five heart cases that were used for this research, normal N, atrial premature beat PAC, premature ventricular contraction PVC, left bundle branch block beat LBBB, and right bundle branch block beat RBBB. Classifying these five cases with a high efficiency and accuracy using neural network is our final goal. To achieve this goal, ECG signals must go through specific procedures or steps. The first procedure was ECG signal preprocessing. This step has three sup steps, signal filtering, signal detrending, and signal smoothing. The second procedure is extracting features of ECG signals. The forth one is classifying ECG signals using neural network. Finally, the results of NN will be saved for future purposes. Our system was implemented by using MATLAB because it is a very powerful software for signal processing and signal analysis. Our research was ended with some good achievements and optimizations. For example, discovering good techniques for filtering, finding new way for features extraction, building one neural network to classify multiple heart diseases, and making a high accuracy with 96.88% percent.65 0Item Restricted Home Monitoring in Interstitial Lung Disease(University College London, 2024) Althobiani, Malik Abdulmalik; Hurst, John R; Porter, Joanna; Russell, Anne-Marie; Folarin, AmosIntroduction: Interstitial lung disease (ILD) comprises a variety of conditions affecting the parenchyma of the lung, with a diverse incidence. Some patients are prone to rapid progression, while others are susceptible to exacerbations. Forced vital capacity (FVC) is used as an endpoint in clinical trials for novel idiopathic pulmonary fibrosis (IPF) therapies. However, it is often measured every three months, resulting in lengthy monitoring periods to identify meaningful treatment responses or disease trajectories. Home spirometry may enable more regular monitoring, potentially allowing for faster detection of ineffective treatment and reductions in clinical trial size, duration, and cost. Individuals with ILD often experience cough, shortness of breath, anxiety, exercise limitation, and fatigue, impacting their quality-of-life (QoL). Conventional indicators of disease progression, such as pulmonary function tests (PFT), may not completely capture the severity of symptoms experienced by patients. Continuous remote patient monitoring involving more than FVC may provide a more complete and real-time assessment of physiological parameters and symptoms. However, the views of clinicians and patients are poorly understood, as is the feasibility and utility of delivering such an approach. Aim: To systematically gather, summarise and evaluate the evidence from clinical trials for feasibility, reliability, and detection of exacerbations and/or disease progression in patients with ILD. To understand the views of clinicians and patients about home monitoring in patients with ILD. To investigate the feasibility and utility of a 4 contemporary approach to patient care using commercially available technology to detect disease progression in patients with ILD through continuous monitoring of physiological parameters and symptoms. Methods: A systematic review was conducted assessing studies on home monitoring of physiological parameters and symptoms to detect ILD exacerbations and progression. This was followed by an international survey of clinicians to explore their perspectives on using telehealth for remote ILD health care support. A patient survey was then conducted to quantify patients’ use of and experiences with digital devices. These preliminary studies informed the development of the research question and main PhD hypotheses. To test these hypothesis, two subsequent studies were conducted. Firstly, a feasibility study that assessed the feasibility, acceptability, and value of remote monitoring using commercially available technologies over 6 months period. Secondly, a prospective observational cohort study that evaluated a real-time multimodal program using commercially available technology to detect disease progression in patients with ILD through continuous monitoring of physiological parameters and symptoms. Results: The systematic review provided supportive evidence for the feasibility and acceptability of home monitoring in patients with ILD and identified priorities for future research. The findings of the follow-up studies indicated that although health care professionals recognised the potential benefits of home monitoring, their adoption rate was low due to barriers like lack of organisational support, technical issues, and 5 workload constraints. Although the findings of the mixed-methods study have demonstrated that digital devices are widely used among patients with ILD, the views and perspectives regarding the use of these devices is varied. The prospective multi- centre observational cohort study provided evidence supporting the feasibility and acceptability of remote monitoring to capture both subjective and objective data from varied sources in patients with respiratory diseases. The high engagement level observed from the passively collected data suggests the potential value of wearables for long-term, user-friendly remote monitoring in chronic respiratory disease management. The main study is one of the first to employ a comprehensive multimodal remote monitoring system to investigate the potential of home-monitoring to detect progression in patients with ILD. The results demonstrate the potential of multimodal home-monitoring to assess associations between physiological parameters and symptoms with disease progression, and to detect disease progression in patients with ILD. Moreover, the results suggest a strong correlation between hospital and home measurements of forced vital capacity in patients with ILD. Conclusion: Taken collectively, the findings presented in this thesis supports the use of a multimodal home-monitoring system, and the potential role for physiological parameters and symptoms to detect ILD progression. It provides a contemporary, personalised approach to patient management. These results provide a critical initial step towards further evaluating the value of home-monitoring for ILD management. However, larger, longitudinal validation studies are required. Future research could explore the potential of machine learning algorithms on this data for real-time detection of ILD disease progression. Machine learning models could provide early detection of changes in lung function and alert patients and healthcare providers to acute and chronic changes and empower patients to better self-manage their disease. This could allow for timely interventions and more personalised management of ILD.22 0Item Restricted Network Alignment Using Topological And Node Embedding Features(Purdue University, 2024-08) Almulhim, Aljohara; AlHasan, MohammadIn today’s big data environment, development of robust knowledge discovery solutions depends on integration of data from various sources. For example, intelligence agencies fuse data from multiple sources to identify criminal activities; e-commerce platforms consolidate user activities on various platforms and devices to build better user profile; scientists connect data from various modality to develop new drugs, and treatments. In all such activities, entities from different data sources need to be aligned—first, to ensure accurate analysis and more importantly, to discover novel knowledge regarding these entities. If the data sources are networks, aligning entities from different sources leads to the task of network alignment, which is the focus of this thesis. The main objective of this task is to find an optimal one-to-one correspondence among nodes in two or more networks utilizing graph topology and nodes/edges attributes. In existing works, diverse computational schemes have been adopted for solving the network alignment task; these schemes include finding eigen-decomposition of similarity matrices, solving quadratic assignment problems via sub-gradient optimization, and designing iterative greedy matching techniques. Contemporary works approach this problem using a deep learning framework by learning node representations to identify matches. Node matching’s key challenges include computational complexity and scalability. However, privacy concerns or unavailability often prevent the utilization of node attributes in real-world scenarios. In light of this, we aim to solve this problem by relying solely on the graph structure, without the need for prior knowledge, external attributes, or guidance from landmark nodes. Clearly, topology-based matching emerges as a hard problem when compared to other network matching tasks. In this thesis, I propose two original works to solve network topology-based alignment task. The first work, Graphlet-based Alignment (Graphlet-Align), employs a topological approach to network alignment. Graphlet-Align represents each node with a local graphlet count based signature and use that as feature for deriving node to node similarity across a pair of networks. By using these similarity values in a bipartite matching algorithm GraphletAlign obtains a preliminary alignment. It then uses high-order information extending to k-hop neighborhood of a node to further refine the alignment, achieving better accuracy. We validated Graphlet-Align’s efficacy by applying it to various large real-world networks, achieving accuracy improvements ranging from 20% to 72% over state-of-the-art methods on both duplicated and noisy graphs. Expanding on this paradigm that focuses solely on topology for solving graph alignment, in my second work, I develop a self-supervised learning framework known as Self-Supervised Topological Alignment (SST-Align). SST-Align uses graphlet-based signature for creating self-supervised node alignment labels, and then use those labels to generate node embedding vectors of both the networks in a joint space from which node alignment task can be effectively and accurately solved. It starts with an optimization process that applies average pooling on top of the extracted graphlet signature to construct an initial node assignment. Next, a self-supervised Siamese network architecture utilizes both the initial node assignment and graph convolutional networks to generate node embeddings through a contrastive loss. By applying kd-tree similarity to the two networks’ embeddings, we achieve the final node mapping. Extensive testing on real-world graph alignment datasets shows that our developed methodology has competitive results compared to seven existing competing models in terms of node mapping accuracy. Additionally, we establish the Ablation Study to evaluate the two-stage accuracy, excluding the learning representation part and comparing the mapping accuracy accordingly. This thesis enhances the theoretical understanding of topological features in the analysis of graph data for network alignment task, hence facilitating future advancements toward the field.12 0Item Restricted EAVESDROPPING-DRIVEN PROFILING ATTACKS ON ENCRYPTED WIFI NETWORKS: UNVEILING VULNERABILITIES IN IOT DEVICE SECURITY(University of Central Florida, 2024-08-02) Alwhbi, Ibrahim; Zou, ChangchunThis dissertation investigates the privacy implications of WiFi communication in Internet-of-Things (IoT) environments, focusing on the threat posed by out-of-network observers. Recent research has shown that in-network observers can glean information about IoT devices, user identities, and activities. However, the potential for information inference by out-of-network observers, who do not have WiFi network access, has not been thoroughly examined. The first study provides a detailed summary dataset, utilizing Random Forest for data summary classifica- tion. This study highlights the significant privacy threat to WiFi networks and IoT applications from out-of-network observers. Building on this investigation, the second study extends the research by utilizing a new set of time series monitored WiFi data frames and advanced machine learning algorithms, specifically xGboost, for Time Series classification. This extension achieved high accuracy of up to 94% in identifying IoT devices and their working status, demonstrating faster IoT device profiling while maintaining classification accuracy. Furthermore, the study underscores the ease with which out- side intruders can harm IoT devices without joining a WiFi network, launching attacks quickly and leaving no detectable footprints. Additionally, the dissertation presents a comprehensive survey of recent advancements in machine- learning-driven encrypted traffic analysis and classification. Given the challenges posed by encryp- tion for traditional packet and traffic inspection, understanding and classifying encrypted traffic are crucial. The survey provides insights into utilizing machine learning for encrypted network traffic analysis and classification, reviewing state-of-the-art techniques and methodologies. This survey serves as a valuable resource for network administrators, cybersecurity professionals, and policy enforcement entities, offering insights into current practices and future directions in encrypted traffic analysis and classification.25 0Item Restricted Exploring the Impact of Sentiment Analysis on Price Prediction(Lehigh University, 2024-07) Zahhar, Abdulkarim Ali Y.; Robinson, Daniel P.The integration of sentiment analysis into predictive models for financial markets, particularly Bitcoin, combines behavioral finance with quantitative analysis. This thesis investigates the extent to which sentiment data, derived from social media platforms such as X (formerly Twitter), can enhance the accuracy of Bitcoin price predictions. A key idea in the study is that public sentiment, as shown on social media, affects Bitcoin’s market prices. The research uses linear regression models that combine Bitcoin’s opening prices with sentiment scores from social media to forecast closing prices. The analysis covers the period from January 2012 to December 2019. Sentiment scores were analyzed using VADER and TextBlob lexicons. The empirical findings show that models incorporating sentiment scores enhance predictive accuracy. For example, incorporating daily average sentiment scores (v avg and B avg) into the models reduced the Mean Squared Error (MSE) from 81184 to 81129 and improved other metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), particularly at specific lag times like 8 and 76 days. These results emphasize the potential benefits of sentiment analysis to improve financial forecasting models. However, it also acknowledges limitations related to the scope of data and the complexities of accurately measuring sentiment. Future research is encouraged to explore more sophisticated models and diverse data sources to further enhance and validate the integration of sentiment analysis in financial forecasting.89 0Item Restricted Towards Cost-Effective Noise-Resilient Machine Learning Solutions(University of Georgia, 2026-06-04) Gharawi, Abdulrahman Ahmed; Ramaswamy, LakshmishMachine learning models have demonstrated exceptional performance in various applications as a result of the emergence of large labeled datasets. Although there are many available datasets, acquiring high-quality labeled datasets is challenging since it involves huge human supervision or expert annotation, which are extremely labor-intensive and time-consuming. The problem is magnified by the considerable amount of label noise present in datasets from real-world scenarios, which significantly undermines the performance accuracy of machine learning models. Since noisy datasets can affect the performance of machine learning models, acquiring high-quality datasets without label noise becomes a critical problem. However, it is challenging to significantly decrease label noise in real-world datasets without hiring expensive expert annotators. Based on extensive testing and research, this dissertation examines the impact of different levels of label noise on the accuracy of machine learning models. It also investigates ways to cut labeling expenses without sacrificing required accuracy. Finally, to enhance the robustness of machine learning models and mitigate the pervasive issue of label noise, we present a novel, cost-effective approach called Self Enhanced Supervised Training (SEST).21 0