Unsupervised Semantic Change Detection in Arabic
Date
2023-10-23
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Queen Mary University of London
Abstract
This study employs pretrained BERT models— AraBERT, CAMeLBERT (CA), and CAMeLBERT (MSA)—to investigate semantic change in Arabic across distinct time periods. Analyzing word embeddings and cosine distance scores reveals variations in capturing semantic shifts. The research highlights the significance of training data quality and diversity, while acknowledging limitations in data scope. The project's outcome—a list of most stable and changed words—contributes to Arabic NLP by shedding light on semantic change detection, suggesting potential model selection strategies and areas for future exploration.
Description
Keywords
Natural Language Processing, Arabic NLP, Langauge Models, BERT, Data Science, Semantic Change, Unsupervised