Developing a Generative AI Model to Enhance Sentiment Analysis for the Saudi Dialect
No Thumbnail Available
Date
2024-12
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Texas Tech University
Abstract
Sentiment Analysis (SA) is a fundamental task in Natural Language Processing (NLP) with broad applications across various real-world domains. While Arabic is a globally significant language with several well-developed NLP models for its standard form, achieving high performance in sentiment analysis for the Saudi Dialect (SD) remains challenging. A key factor contributing to this difficulty is inadequate SD datasets for training of NLP models. This study introduces a novel method for adapting a high-resource language model to a closely related but low-resource dialect by combining moderate effort in SD data collection with generative AI to address this problem of inadequacy in SD datasets. Then, AraBERT was fine-tuned using a combination of collected SD data and additional SD data generated by GPT. The results demonstrate a significant improvement in SD sentiment analysis performance compared to the AraBERT model, which is fine-tuned with only collected SD datasets. This approach highlights an efficient approach to generating high-quality datasets for fine-tuning a model trained on a high-resource language to perform well in a low-resource dialect. Leveraging generative AI enables reduced effort in data collection, making our approach a promising avenue for future research in low-resource NLP tasks.
Description
Keywords
Generative AI, Sentiment Analysis, Saudi Dialect, NLP