Topic Classiﬁcation in the Arab Twittersphere

AZIZAH THAMER HUSSAIN ALTHAGAFI

Topic Classiﬁcation in the Arab Twittersphere

dc.contributor.advisor	Dr. Walid Magdy-University of Edinburgh and Mr. Taoufik Hedfi from Saudi Cultural Bureau-UK
dc.contributor.author	AZIZAH THAMER HUSSAIN ALTHAGAFI
dc.date	2019
dc.date.accessioned	2022-06-06T03:05:59Z
dc.date.available	2020-04-13 17:30:39
dc.date.available	2022-06-06T03:05:59Z
dc.description.abstract	Abstract Social media has become a signiﬁcant component of our daily lives, including Twitter. Twitter is an online microblogging service where users express their opinions about the news, culture, religion, and so much more. Notwithstanding the various Arabiccontent studies on Twitter, very little research has been done on religious controversies in the Arab world. Arabic tweets have therefore become an attractive ﬁeld for several new studies, including tweet classiﬁcation. Tweet classiﬁcation is a process by which tweets (short messages post made on Twitter) are classiﬁed into one of several predeﬁned topics. In this project, we aim to classify Arabic tweets and investigate topics associated with religion and atheism in Arab regions. The content of Arab users’ tweets are classiﬁed and analysed, including conversations and other interactions (retweets, mentions and replies). This study provides a set of annotated Arabic tweets for classiﬁcation tasks out of a broad set of 137K Arabic tweets covering various topics related to religion and atheism. The main objective is to classify the data into four general topics: women’s issues, including marriage and women’s rights in religion; creation and logic issues about the theory of creation, evolution, and the God; holy books and prophets; political issues; and other issues. Using machine-learning methods, including Support Vector Machine (SVM) and neural-networks with pre-trained word-embedding, we categorise each tweet by topic. Based on the topics detected in the training data, our model identiﬁes relevant topics for previously unseen tweets. The experiments show that our embedded Bi-directional Long Short-Term Memory (Bi-LSTM) neural-network achieved 64% accuracy, a macroaveraged F1 score of 61% over the SVM classiﬁer (which reached 56% accuracy) and a macro-averaged F1 score of 53%. Keywords: social media mining, text analysis, Arabic NLP, Arabic tweets, classiﬁcations, deep-learning.
dc.format.extent	44
dc.identifier.other	81561
dc.identifier.uri	https://drepo.sdl.edu.sa/handle/20.500.14154/67674
dc.language.iso	en
dc.publisher	Saudi Digital Library
dc.title	Topic Classiﬁcation in the Arab Twittersphere
dc.type	Thesis
sdl.degree.department	COMPUTER SCIENCE
sdl.degree.grantor	THE UNIVERSITY OF EDINBURGH
sdl.thesis.level	Master
sdl.thesis.source	SACM - United Kingdom

Collections

SACM - United Kingdom

Topic Classiﬁcation in the Arab Twittersphere

Files

Collections