SACM - United States of America

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9668

Browse

Search Results

Now showing 1 - 10 of 13
  • ItemRestricted
    Towards Representative Pre-training Corpora for Arabic Natural Language Processing
    (Clarkson University, 2024-11-30) Alshahrani, Saied Falah A; Matthews, Jeanna
    Natural Language Processing (NLP) encompasses various tasks, problems, and algorithms that analyze human-generated textual corpora or datasets to produce insights, suggestions, or recommendations. These corpora and datasets are crucial for any NLP task or system, as they convey social concepts, including views, culture, heritage, and perspectives of native speakers. However, a corpus or dataset in a particular language does not necessarily represent the culture of its native speakers. Native speakers may organically write some textual corpora or datasets, and some may be written by non-native speakers, translated from other languages, or generated using advanced NLP technologies, such as Large Language Models (LLMs). Yet, in the era of Generative Artificial Intelligence (GenAI), it has become increasingly difficult to distinguish between human-generated texts and machine-translated or machine-generated texts, especially when all these different types of texts, i.e., corpora or datasets, are combined to create large corpora or datasets for pre-training NLP tasks, systems, and technologies. Therefore, there is an urgent need to study the degree to which pre-training corpora or datasets represent native speakers and reflect their values, beliefs, cultures, and perspectives, and to investigate the potentially negative implications of using unrepresentative corpora or datasets for the NLP tasks, systems, and technologies. One of the most widely utilized pre-training corpora or datasets for NLP are Wikipedia articles, especially for low-resource languages like Arabic, due to their large multilingual content collection and massive array of metadata that can be quantified. In this dissertation, we study the representativeness of the Arabic NLP pre-training corpora or datasets, focusing specifically on the three Arabic Wikipedia editions: Arabic Wikipedia, Egyptian Arabic Wikipedia, and Moroccan Arabic Wikipedia. Our primary goals are to 1) raise awareness of the potential negative implications of using unnatural, inorganic, and unrepresentative corpora—those generated or translated automatically without the input of native speakers, 2) find better ways to promote transparency and ensure that native speakers are involved through metrics, metadata, and online applications, and 3) strive to reduce the impact of automatically generated or translated contents by using machine learning algorithms to identify or detect them automatically. To do this, firstly, we analyze the metadata of the three Arabic Wikipedia editions, focusing on differences using collected statistics such as total pages, articles, edits, registered and active users, administrators, and top editors. We document issues related to the automatic creation and translation of articles (content pages) from English to Arabic without human (i.e., native speakers) review, revision, or supervision. Secondly, we quantitatively study the performance implications of using unnatural, inorganic corpora that do not represent native speakers and are primarily generated using automation, such as bot-created articles or template-based translation. We intrinsically evaluate the performance of two main NLP tasks—Word Representation and Language Modeling—using the Word Analogy and Fill-Mask evaluation tasks on our two newly created datasets: the Arab States Analogy Dataset and the Masked Arab States Dataset. Thirdly, we assess the quality of Wikipedia corpora at the edition level rather than the article level by quantifying bot activities and enhancing Wikipedia’s Depth metric. After analyzing the limitations of the existing Depth metric, we propose a bot-free version by excluding bot-created articles and bot-made edits on articles called the DEPTH+ metric, presenting its mathematical definitions, highlighting its features and limitations, and explaining how this new metric accurately reflects human collaboration depth within the Wikipedia project. Finally, we address the issue of template translation in the Egyptian Arabic Wikipedia by identifying these template-translated articles and their characteristics. We explore the content of the three Arabic Wikipedia editions in terms of density, quality, and human contributions and employ the resulting insights to build multivariate machine learning classifiers leveraging article metadata to automatically detect template-translated articles. We lastly deploy the best-performing classifier publicly as an online application and release the extracted, filtered, labeled, and preprocessed datasets to the research community to benefit from our datasets and the web-based detection system.
    34 0
  • Thumbnail Image
    ItemRestricted
    Multi-Stage and Multi-Target Data-Centric Approaches to Object Detection, Localization, and Segmentation in Medical Imaging
    (University of California San Diego, 2024) Albattal, Abdullah; Nguyen, Truong
    Object detection, localization, and segmentation in medical images are essential in several medical procedures. Identifying abnormalities and anatomical structures of interest within these images remains challenging due to the variability in patient anatomy, imaging conditions, and the inherent complexities of biological structures. To address these challenges, we propose a set of frameworks for real-time object detection and tracking in ultrasound scans and two frameworks for liver lesion detection and segmentation in single and multi-phase computed tomography (CT) scans. The first framework for ultrasound object detection and tracking uses a segmentation model weakly trained on bounding box labels as the backbone architecture. The framework outperformed state-of-the-art object detection models in detecting the Vagus nerve within scans of the neck. To improve the detection and localization accuracy of the backbone network, we propose a multi-path decoder UNet. Its detection performance is on par with, or slightly better than, the more computationally expensive UNet++, which has 20% more parameters and requires twice the inference time. For liver lesion segmentation and detection in multi-phase CT scans, we propose an approach to first align the liver using liver segmentation masks followed by deformable registration with the VoxelMorph model. We also propose a learning-free framework to estimate and correct abnormal deformations in deformable image registration models. The first framework for liver lesion segmentation is a multi-stage framework designed to incorporate models trained on each of the phases individually in addition to the model trained on all the phases together. The framework uses a segmentation refinement and correction model that combines these models' predictions with the CT image to improve the overall lesion segmentation. The framework improves the subject-wise segmentation performance by 1.6% while reducing performance variability across subjects by 8% and the instances of segmentation failure by 50%. In the second framework, we propose a liver lesion mask selection algorithm that compares the separation of intensity features between the lesion and surrounding tissue from multi-specialized model predictions and selects the mask that maximizes this separation. The selection approach improves the detection rates for small lesions by 15.5% and by 4.3% for lesions overall.
    19 0
  • Thumbnail Image
    ItemRestricted
    An In-Depth Analysis of the Adoption of Large Language Models in Clinical Settings: A Fuzzy Multi-Criteria Decision-Making Approach
    (University of Bridgeport, 2024-08-05) Aldwean, Abdullah; Tenney, Dan
    The growing capabilities of large language models (LLMs) in the medical field hold promising transformational change. The evolution of LLMs, such as BioBERT and MedGPT, has created new opportunities for enhancing the quality of healthcare services, improving clinical operational efficiency, and addressing numerous existing healthcare challenges. However, the adoption of these innovative technologies into clinical settings is a complex, multifaceted decision problem influenced by various factors. This dissertation aims to identify and rank the challenges facing the integration of LLMs into healthcare clinical settings and evaluate different adoption solutions. To achieve this goal, a combined approach based on the Fuzzy Analytic Hierarchy Process (FAHP) and the Fuzzy Technique for Order of Preference by Similarity to Ideal Solution (Fuzzy TOPSIS) has been employed to prioritize these challenges and then use them to rank potential LLM adoption solutions based on experts’ opinion. However, utilizing LLMs technologies in clinical settings faces several challenges across societal, technological, organizational, regulatory, and economic (STORE) perspectives. The findings indicate that regulatory concerns, such as accountability and compliance, are considered the most critical challenges facing LLMs adoption decision. This research provides a thorough and evidence-based assessment of LLMs in the clinical settings. It offers a structured framework that helps decision-makers navigate the complexities of leveraging such disruptive innovations in clinical practice.
    24 0
  • Thumbnail Image
    ItemRestricted
    COMPREHENSIVE SYSTEM ARCHITECTURE FOR HOUSEHOLD REPLENISHMENT SYSTEM: SIMULATION OPTIMIZATION FOR INVENTORY REPLENISHMENT POLICY CONSIDERING QUALITY DEGRADATION & STOCHASTIC DEMAND
    (Binghamton University, 2024) Almassar, Khaled; Khasawneh, Mohammad
    Food wastage because of the lack or incompletion of a Household Replenishment System is an essential topic to be addressed. Appropriately using the Internet of Things (IoT) and Artificial Intelligence (AI) technologies with particular components is needed to design a smart Household Replenishment System to reduce food waste. This dissertation develops a unified framework and conceptual system architecture for the implementation of a Household Replenishment System, presents an object recognition framework to identify labeled and unlabeled items inside a smart refrigerator, and showcases a low-cost installation model for a smart refrigerator. It develops and validate a simulation optimization model of perishable items inside a smart refrigerator for an optimal replenishment policy. To accomplish those goals, this dissertation initially provides comprehensive analyses and a summary of the literature using IoT and AI tools for perishable items storage compartments, as they are always full of items that need to be monitored. This comprehensive research followed the PRISMA search strategy, which was conducted to point out the approaches, contributions, components used, and limitations of the reviewed papers in developing a unified framework for a household replenishment system. More specifically, 70 papers were examined in chronological order starting from 2000 when LG Electronics invented the first smart refrigerator, and research on technology involvement in food storage compartments increased. The analysis found 43 approaches using IoT technology, 27 using AI, and in the past couple of years, the use of AIoT has been trending. The future directions for researchers were acquired from the limitations of the reviewed papers, and they could enhance the household replenishment system by adding the features to smart food storage compartments v The comprehensive research helps fulfill an objective of this dissertation, system architecture framework The system architecture acts as a road map for developers to implement a Household Replenishment System. It sheds more light on one of the most important techniques of AI, object recognition. A framework of object recognition is developed. The developed object recognition provides insight into the type of information about the stored items that could be obtained by the Household Replenishment System. A practical example of a cost model is presented. The developed cost model minimizes the total installation cost of the smart refrigerator based on household preferences using linear programming, adoption of capital budgeting, and multidimensional knapsack problems. The object recognition framework presented is conceptual. Therefore, several assumptions were used to develop a simulation optimization model of the Household Replenishment System. The simulation optimization model uses discrete-event simulations and a periodic policy considering the review period, minimum stock level, and maximum stock level (T, s, S). The simulation optimization model finds the optimal replenishment policy to minimize the total Household Replenishment Systems’ inventory cost. The simulation optimization model considers holding, shortage, wastage, and order costs as components in the objective function, which accounts for stochastic demand, variation in life span, and quality degradation rates of stored items. The simulation optimization model was tested on single and multiple items, with different scenarios for the multipleitem cases. Experimental runs of the simulation optimization model were completed, validated, and analyzed. The design of the experiment and sensitivity analyses were applied. The simulation optimization model successfully generated a set of top five optimal replenishment policies for the household to choose from. Further investigation into smart home appliances would lead to extensive approaches like smart shops, vi industries, and eventually smart cities. Future work for this dissertation could be achieved by enlarging the scope of research to involve patents, dissertations, and theses that used Artificial Intelligence of Things (AIoT) technologies to improve the Household Replenishment System.
    21 0
  • Thumbnail Image
    ItemRestricted
    Exploring Factors Influencing the Adoption of AI Tools in Auditing: A Mixed-Methods Study
    (Virginia Commonwealth University, 2024-07-12) Alsudairi, Fahad; Yoon, Victoria; Osei-Bryson, Kweku-Muata; Etudo, Ugochukwu; Senechal, Jesse
    Artificial Intelligence's (AI) rise has created value for organizations and society, prompting scholars to study its spread across many areas. However, the impact of AI adoption on governmental organizations still needs to be explored. Governmental entities face unique challenges distinct from private organizations, and existing research often focuses on the perspectives of AI experts or senior management, neglecting the insights of lower-level employees who will use the system daily. This study investigates the multifaceted factors influencing the intention to adopt AI tools within a governmental auditing bureau in Saudi Arabia. To the best of our knowledge, no previous study has specifically delved into AI adoption within the context of governmental auditing in the literature. This study employs an exploratory mixed-method approach based on IS guidelines by Venkatesh et al. (2013, 2016). This research combines qualitative and quantitative methods to comprehensively investigate the factors influencing the intention to adopt AI tools in auditing. Initially, the study identifies key factors and develops a conceptual model grounded in qualitative data and theoretical background. The model is then validated and tested through a survey using a larger sample within the governmental bureau. The findings support many hypotheses, emphasizing the significance of technological factors such as AI complexity, perceived scalability, relative advantage, and security in the intention to adopt AI tools in auditing. The study also highlights the need to align governmental auditing tasks and AI tools, and the importance of Task Technology fit. Organizational factors, such as leadership support and strategic AI implementation, are crucial for successfully adopting AI. Additionally, environmental factors underscore the pivotal role of higher authorities in facilitating and supporting AI adoption in governmental organizations. This study offers several contributions. It extends organizational AI adoption literature by broadening the understanding of AI adoption factors, emphasizing the value of studying government organizations due to their unique nature, and providing insights into the factors affecting AI adoption from the end-user's viewpoint. It offers practical benefits for the governmental auditing agency and similar governmental organizations. Educationally, this dissertation functions as a rich case study within the Information Systems (IS) field, providing a valuable educational resource. Possible limitations include sample selection constraints, sample size in Phase I, and the limited contextual scope of the study. Directions for future research include examining the dynamics of AI implementation over time through longitudinal studies, testing the conceptual model across different governmental sectors and similar cultural and socio-political contexts, and investigating how AI tools affect auditors' compensation and job satisfaction.
    57 0
  • Thumbnail Image
    ItemRestricted
    HIGH DIMENSIONAL TIME SERIES DATA MINING IN AUTOMATIC FIRE MONITORING AND AUTOMOTIVE QUALITY MANAGEMENT
    (Rutgers, The State University of New Jersey, 2024-05) Alhindi, Taha Jaweed O; Jeong, Myong K.
    Time series data is increasingly being generated in many domains around the world. Monitoring an event using multiple variables gathered over time forms a high-dimensional time series when the number of variables is high. High-dimensional time series are being widely applied across many areas. Thus, the need to develop more efficient and effective approaches to analyze and monitor high-dimensional time series data has become more critical. For instance, within the realm of fire disaster management, the advancement of fire detection systems has garnered research interest aimed at safeguarding human lives and property against devastating fire incidents. Nonetheless, the task of monitoring indoor fires presents complexities attributed to the distinct attributes of fire sensor signals (namely, high-dimensional time series), including the presence of time-based dependencies and varied signal patterns across different types of fires, such as those from flaming, heating, and smoldering sources. In the field of automobile quality management, minimizing internal vehicle noise is crucial for enhancing both customer satisfaction and the overall quality of the vehicle. Windshield wipers are significant contributors to such noise, and defective wipers can adversely impact the driving perception of passengers. Therefore, detecting wiper defects during production can lead to an improved driving experience, enhanced vehicle and road safety, and decreased driver distraction. Currently, the process for detecting noise from windshield wipers is manual, subjective, and requires considerable time. This dissertation presents several novel time series monitoring and anomaly detection approaches in two domains: 1) fire disaster management and 2) automotive quality management. The proposed approaches effectively address the limitations of traditional and existing systems and enhance human safety while reducing human and economic losses. In the fire disaster management domain, we first propose two fire detection systems using dynamic time warping (DTW) distance measure. The first fire detection system is based on DTW and the nearest neighbor (NN) classifier (NN-DTW). The second fire detection system utilizes a support vector machine with DTW kernel function (SVM-DTWK) to improve classification accuracy by utilizing SVM capability to obtain nonlinear decision boundaries. Using the DTW distance measure, both fire detection systems retain the temporal dynamics in the sensor signals of different fire types. Additionally, the suggested systems dynamically identify the essential sensors for early fire detection through the newly developed k-out-of-P fire voting rule. This rule integrates decision-making processes from P multichannel sensor signals effectively. To validate the efficiency of these systems, a case study was conducted using a real-world fire dataset from the National Institute of Standards and Technology. Secondly, we introduce a real-time, wavelet-based fire detection algorithm that leverages the multi-resolution capability of wavelet transformation. This approach differs from traditional fire detection methods by capturing the temporal dynamics of chemical sensor signals for different fire scenarios, including flaming, heating, and smoldering fires. A novel feature selection method tailored to fire types is employed to identify optimal features for distinguishing between normal conditions and various fire situations. Subsequently, a real-time detection algorithm incorporating a multi-model framework is developed to efficiently apply these chosen features, creating multiple fire detection models adept at identifying different fire types without pre-existing knowledge. Testing with publicly available fire data indicates that our algorithm surpasses conventional methods in terms of early detection capabilities, maintaining a low rate of false alarms across all fire categories. Thirdly, we introduce an innovative fire detection system designed for monitoring a range of indoor fire types. Unlike traditional research, which tends to separate the development of fire sensing and detection algorithms, our system seamlessly integrates these phases. This integration allows for the effective real-time utilization of varied sensor signals to identify fire outbreaks at their inception. Our system collects data from multiple types of sensors, each sensitive to different fire-emitted components. This data then feeds into a similarity matching-based detection algorithm that identifies distinct pattern shapes within the sensor signals across various fire conditions, enabling early detection of fires with minimal false alarms. The efficacy of this system is demonstrated through the use of real sensor data and experimental results, underscoring the system’s ability to accurately detect fires at an early stage. Lastly, in the automotive quality management domain, we introduce an innovative automated system for detecting faults in windshield wipers. Initially, we apply a new binarization technique to transform spectrograms of the sound produced by windshield wipers, isolating noisy regions. Following this, we propose a novel matrix factorization technique, termed orthogonal binary singular value decomposition, to break down these binarized mel spectrograms into uncorrelated binary eigenimages. This process enables the extraction of significant features for identifying defective wipers. Utilizing the k-NN classifier, these features are then categorized into normal or faulty wipers. The system’s efficiency was validated using real-world datasets of windshield wiper reversal and squeal noises, demonstrating superior performance over existing methodologies. The proposed approaches excel in detecting complex temporal patterns in high-dimensional time series data, with wide applicability across healthcare, environmental monitoring, and manufacturing for tasks like vital signs monitoring, climate and pollution tracking, and machinery maintenance. Additionally, the OBSVD technique, producing binary, uncorrelated eigenimages for unique information capture, broadens its use to medical imaging for anomaly detection, security for facial recognition, quality control for defect detection, document processing, and environmental analysis via satellite imagery. This versatility highlights the research's significant potential across machine learning and signal processing, improving efficiency and accuracy in time series data analysis.
    25 0
  • Thumbnail Image
    ItemRestricted
    Automatic Sketch-to-Image Synthesis and Recognition
    (University of Dayton, 2024-05) Baraheem, Samah; Nguyen, Tam
    Image is used everywhere since it conveys a story, a fact, or an imagination without any words. Thus, it can substitute the sentences because the human brain can extract the knowledge from images faster than words. However, creating an image from scratch is not only time-consuming, but also a tedious task that requires skills. Creating an image is not a trivial task since it contains rich features and fine-grained details, such as colors, brightness, saturation, luminance, texture, shadow, and so on. Thus, in order to generate an image in less time and without any artistic skills, sketch-to-image synthesis can be used. The reason is that hand sketches are much easier to produce, where only the key structural information is contained. Moreover, it can be drawn without skills and in less time. In fact, since sketches are often simple and rough black and white and sometimes imperfect, converting a sketch into an image is not a trivial problem. Hence, it has attracted the researchers' attention to solve this challenging problem; therefore, much research has been conducted in this field to generate photorealistic images. However, the generated images still suffer from issues, such as the un-naturality, the ambiguity, the distortion, and most importantly, the difficulty in generating images from complex input with multiple objects. Most of these problems are due to converting a sketch into an image directly in one-shot. To this end, in this dissertation, we propose a new framework that divides the problem into sub-problems, leading to generating high-quality photorealistic images even with complicated sketches. Instead of directly mapping the input sketch into an image, we map the sketch into an intermediate result, namely, mask map, through an instance segmentation and semantic segmentation in two levels: background segmentation and foreground segmentation. Background segmentation is formed based on the 4 context of the existing foreground objects. Various natural scenes are implemented for both indoor and outdoor scenes. Then, a foreground segmentation process is commenced, where each detected object is sequentially and semantically added into the constructed segmented background. Next, the mask map is converted into an image through image-to-image translation model. Following this step, a post-processing stage is implemented to enhance the synthetic image further via background improvement and human face refinement. This leads to not only generating better results but also being able to generate images from complicated sketches with multiple objects. We further improve our framework by implementing scene and size sensing. As for size awareness feature, in the instance segmentation stage, the objects' sizes might be modified based on the surrounding environment and their respective size prior to reflect reality and produce more realistic and naturalistic images. Moreover, to implement scene awareness feature in the background improvement step, after the scene is initially defined based on the context and then classified based on a scene classifier, a scene image is first selected. Then, the generated objects are placed on the chosen scene image and based on a pre-defined snapping point to place the objects in their proper location and maintain realism. Furthermore, since the generated images have been improved over time regardless of the input modality, it sometimes becomes hard to distinguish between the synthetic images and genuine ones. Of course, this improves the content and the media, but it is considered a serious threat regarding legitimacy, authenticity, and security. Thus, an automatic detection system of AI-generated images is a legitim need. This system also can be used for image synthesis models as an evaluation tool despite the input modality. Indeed, AI-generated images usually bear explicit or implicit artifacts that result during the generation process. Prior research work on detecting the synthetic images generated by one specific model or similar models with similar architecture. Hence, a generalization problem has arisen. To tackle this problem, we propose to fine-tune a pre-trained Convolutional Neural Network (CNN) model on a special newly collected dataset. This 5 dataset consists of AI-generated images from different image synthesis architectures and different input modalities, i.e., text, sketch, and other sources (another image or mask) to help in the generalization ability across various tasks and architectures. Our contribution in general is two-fold. We first generate high-quality realistic images from simple, rough, black and white sketches, where a newly collected dataset of sketch-like images is compiled for training purposes. Second, since artificial images would have advantages and disadvantages in the real world, we create an automated system that is able to detect and localize synthetic images from genuine ones, where a large dataset of generated and real images is collected to train a CNN model.
    19 0
  • Thumbnail Image
    ItemRestricted
    Learning Fast Approximations for Nonconvex Optimization Problems via Deep Learning with Applications to Power Systems
    (Saudi Digital Library, 2024) باسليمان ، كمال; Barati, Masoud
    Nonlinear convex optimization has provided a great modeling language and a powerful solution tool for the control and analysis of power systems over the last decade. A main challenge today is solving non-convex problems in real-time. However, if an oracle can guess, ahead of time, a high quality initial solution, then most non-convex optimization problems can be solved in a limited number of iterations using off-the-shelf solvers. In this proposal, we study how deep learning can provide good approximations for real-time power system applications. These approximations can act as good initial solutions to any exact algorithm. Alternatively, such approximations could be satisfactory to carry out real-time operations in power systems. First, we address the problem of joint power system state estimation and bad data identification. We propose a deep learning model that provides high quality approximations in milliseconds. Second, we address the problem multi-step ahead power system state forecasting and advocate sequence-to-sequence models for better representation. Lastly, we study the problem of learning fast approximations to intialize linear programming solvers. We cast the problem as a simple learning task and propose a deep learning model.
    71 0
  • Thumbnail Image
    ItemRestricted
    Physics and AI-Driven Anomaly Detection in Cyber-Physical Systems
    (Saudi Digital Library, 2023) Alotibi, Faris; Tipper, David
    Organizations across various sectors are moving rapidly to digitization. Multiple applications in cyber-physical systems (CPSs) emerged from interconnectivity such as smart cities, autonomous vehicles, and smart grids, utilizing advanced capabilities of the Internet of Things (IoTs), cloud computing, and machine learning. Interconnectivity also becomes a critical component in industrial systems such as smart manufacturing, smart oil, and gas distribution grid, smart electric power grid, etc. These critical infrastructures and systems rely on industrial IoT and learning-enabled components to handle the uncertainty and variability of the environment and increase autonomy in making effective operational decisions. The prosperity and benefits of systems interconnectivity demand the fulfillment of functional requirements such as interoperability of communication and technology, efficiency and reliability, and real-time communication. Systems need to integrate with various communication technologies and standards, process and analyze shared data efficiently, ensure the integrity and accuracy of exchanged data, and execute their processes with tolerable delay. This creates new attack vectors targeting both physical and cyber components. Protection of systems interconnection and validation of communicated data against cyber and physical attacks become critical due to the consequences of disruption attacks pose to critical systems. In this dissertation, we tackle one of the prominent attacks in the CPS space, namely the false data injection attack (FDIA). FDIA is an attack executed to maliciously influence decisions, that is CPSs operational decisions such as opening a valve, changing wind turbine configurations, charging/discharging energy storage system batteries, or coordinating autonomous vehicles driving. We focus on the development of anomaly detection techniques to protect CPSs from this emerging threat. The anomaly detection mechanisms leverage both physics of CPSs and AI to improve their detection capability as well as the CPSs' ability to mitigate the impact of FDIA on their operations.
    40 0
  • Thumbnail Image
    ItemRestricted
    Ensemble of Handcrafted Features of Environment Sound Classification Using a Deep Convolutional Neural Network to Enhance Accuracy and Reduce Computational Complexity
    (Florida Institute of Technology, 2023-07-23) Aljubayri, Ibrahim; Kepuska, Veton Z.
    Environmental sound classification (ESC) is an area of active research in signal and image processing that has made significant strides over the past several years. The goal of ESC is to classify environmental sounds by extracting and analyzing handcrafted and deep features from various acoustic events. The task is complex because environmental sounds are typically unstructured, nonstationary, and overlapping. Multiple deep learning (DL) approaches have successfully tackled the ESC problem and outperformed conventional classifiers like k-nearest neighbors (kNN) or support vector machine (SVM). However, most DL approaches have high computational costs, making them unsuitable for use in embedded systems applications. In this dissertation, we propose four models that require low computational costs and achieve high accuracy in classifying environmental sounds. Model 1 uses kNN to analyze and extract multiple temporal and spectral handcrafted features. Model 2 extracts deep features from different spectrograms using a proposed deep convolutional neural network (DCNN) with six convolutional layers and four max-pooling layers, totaling 150k parameters. Models 3 and 4 combine handcrafted and deep features to improve classification accuracy. We tested the proposed models on a public dataset called Urbansound8k and achieved a classification accuracy of 95.3%.
    40 0

Copyright owned by the Saudi Digital Library (SDL) © 2024