Topic Classiﬁcation in the Arab Twittersphere
Saudi Digital Library
Abstract Social media has become a signiﬁcant component of our daily lives, including Twitter. Twitter is an online microblogging service where users express their opinions about the news, culture, religion, and so much more. Notwithstanding the various Arabiccontent studies on Twitter, very little research has been done on religious controversies in the Arab world. Arabic tweets have therefore become an attractive ﬁeld for several new studies, including tweet classiﬁcation. Tweet classiﬁcation is a process by which tweets (short messages post made on Twitter) are classiﬁed into one of several predeﬁned topics. In this project, we aim to classify Arabic tweets and investigate topics associated with religion and atheism in Arab regions. The content of Arab users’ tweets are classiﬁed and analysed, including conversations and other interactions (retweets, mentions and replies). This study provides a set of annotated Arabic tweets for classiﬁcation tasks out of a broad set of 137K Arabic tweets covering various topics related to religion and atheism. The main objective is to classify the data into four general topics: women’s issues, including marriage and women’s rights in religion; creation and logic issues about the theory of creation, evolution, and the God; holy books and prophets; political issues; and other issues. Using machine-learning methods, including Support Vector Machine (SVM) and neural-networks with pre-trained word-embedding, we categorise each tweet by topic. Based on the topics detected in the training data, our model identiﬁes relevant topics for previously unseen tweets. The experiments show that our embedded Bi-directional Long Short-Term Memory (Bi-LSTM) neural-network achieved 64% accuracy, a macroaveraged F1 score of 61% over the SVM classiﬁer (which reached 56% accuracy) and a macro-averaged F1 score of 53%. Keywords: social media mining, text analysis, Arabic NLP, Arabic tweets, classiﬁcations, deep-learning.