Developing a Generative AI Model to Enhance Sentiment Analysis for the Saudi Dialect

No Thumbnail Available

Date

2024-12

Journal Title

Journal ISSN

Volume Title

Publisher

Texas Tech University

Abstract

Sentiment Analysis (SA) is a fundamental task in Natural Language Processing (NLP) with broad applications across various real-world domains. While Arabic is a globally significant language with several well-developed NLP models for its standard form, achieving high performance in sentiment analysis for the Saudi Dialect (SD) remains challenging. A key factor contributing to this difficulty is inadequate SD datasets for training of NLP models. This study introduces a novel method for adapting a high-resource language model to a closely related but low-resource dialect by combining moderate effort in SD data collection with generative AI to address this problem of inadequacy in SD datasets. Then, AraBERT was fine-tuned using a combination of collected SD data and additional SD data generated by GPT. The results demonstrate a significant improvement in SD sentiment analysis performance compared to the AraBERT model, which is fine-tuned with only collected SD datasets. This approach highlights an efficient approach to generating high-quality datasets for fine-tuning a model trained on a high-resource language to perform well in a low-resource dialect. Leveraging generative AI enables reduced effort in data collection, making our approach a promising avenue for future research in low-resource NLP tasks.

Description

Keywords

Generative AI, Sentiment Analysis, Saudi Dialect, NLP

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2024