Topic Classification in the Arab Twittersphere

dc.contributor.advisorDr. Walid Magdy-University of Edinburgh and Mr. Taoufik Hedfi from Saudi Cultural Bureau-UK
dc.contributor.authorAZIZAH THAMER HUSSAIN ALTHAGAFI
dc.date2019
dc.date.accessioned2022-06-06T03:05:59Z
dc.date.available2020-04-13 17:30:39
dc.date.available2022-06-06T03:05:59Z
dc.description.abstractAbstract Social media has become a significant component of our daily lives, including Twitter. Twitter is an online microblogging service where users express their opinions about the news, culture, religion, and so much more. Notwithstanding the various Arabiccontent studies on Twitter, very little research has been done on religious controversies in the Arab world. Arabic tweets have therefore become an attractive field for several new studies, including tweet classification. Tweet classification is a process by which tweets (short messages post made on Twitter) are classified into one of several predefined topics. In this project, we aim to classify Arabic tweets and investigate topics associated with religion and atheism in Arab regions. The content of Arab users’ tweets are classified and analysed, including conversations and other interactions (retweets, mentions and replies). This study provides a set of annotated Arabic tweets for classification tasks out of a broad set of 137K Arabic tweets covering various topics related to religion and atheism. The main objective is to classify the data into four general topics: women’s issues, including marriage and women’s rights in religion; creation and logic issues about the theory of creation, evolution, and the God; holy books and prophets; political issues; and other issues. Using machine-learning methods, including Support Vector Machine (SVM) and neural-networks with pre-trained word-embedding, we categorise each tweet by topic. Based on the topics detected in the training data, our model identifies relevant topics for previously unseen tweets. The experiments show that our embedded Bi-directional Long Short-Term Memory (Bi-LSTM) neural-network achieved 64% accuracy, a macroaveraged F1 score of 61% over the SVM classifier (which reached 56% accuracy) and a macro-averaged F1 score of 53%. Keywords: social media mining, text analysis, Arabic NLP, Arabic tweets, classifications, deep-learning.
dc.format.extent44
dc.identifier.other81561
dc.identifier.urihttps://drepo.sdl.edu.sa/handle/20.500.14154/67674
dc.language.isoen
dc.publisherSaudi Digital Library
dc.titleTopic Classification in the Arab Twittersphere
dc.typeThesis
sdl.degree.departmentCOMPUTER SCIENCE
sdl.degree.grantorTHE UNIVERSITY OF EDINBURGH
sdl.thesis.levelMaster
sdl.thesis.sourceSACM - United Kingdom
Files
Collections