Topic Classification in the Arab Twittersphere

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Saudi Digital Library
Abstract Social media has become a significant component of our daily lives, including Twitter. Twitter is an online microblogging service where users express their opinions about the news, culture, religion, and so much more. Notwithstanding the various Arabiccontent studies on Twitter, very little research has been done on religious controversies in the Arab world. Arabic tweets have therefore become an attractive field for several new studies, including tweet classification. Tweet classification is a process by which tweets (short messages post made on Twitter) are classified into one of several predefined topics. In this project, we aim to classify Arabic tweets and investigate topics associated with religion and atheism in Arab regions. The content of Arab users’ tweets are classified and analysed, including conversations and other interactions (retweets, mentions and replies). This study provides a set of annotated Arabic tweets for classification tasks out of a broad set of 137K Arabic tweets covering various topics related to religion and atheism. The main objective is to classify the data into four general topics: women’s issues, including marriage and women’s rights in religion; creation and logic issues about the theory of creation, evolution, and the God; holy books and prophets; political issues; and other issues. Using machine-learning methods, including Support Vector Machine (SVM) and neural-networks with pre-trained word-embedding, we categorise each tweet by topic. Based on the topics detected in the training data, our model identifies relevant topics for previously unseen tweets. The experiments show that our embedded Bi-directional Long Short-Term Memory (Bi-LSTM) neural-network achieved 64% accuracy, a macroaveraged F1 score of 61% over the SVM classifier (which reached 56% accuracy) and a macro-averaged F1 score of 53%. Keywords: social media mining, text analysis, Arabic NLP, Arabic tweets, classifications, deep-learning.