Supervised Machine Learning Ensemble for Fake News Classification of Social Media

dc.contributor.advisorDr Stuart Middleton
dc.contributor.authorMOHAMMED TALAL MOSUILY
dc.date2021
dc.date.accessioned2022-06-04T19:35:44Z
dc.date.available2022-01-14 13:49:34
dc.date.available2022-06-04T19:35:44Z
dc.description.abstractFake news detection is an important task. Social media provides users with an opportunity to communicate and publish content without any financial cost. This opens room for some users to manipulate this opportunity to spread fake news. Social networks generate a considerable amount of data, and manual data processing is troublesome. Therefore, this study investigates automatic fake news detection on Twitter. This research uses a publicly available balanced dataset, called “PAN 2020 Fake news dataset,” containing 500 XML files representing authors, each user has 100 tweets per file. The dataset has English and Spanish tweets. The averaging ensemble method obtained 83.33% accuracy on English tweets considering the retweet feature. In contrast, the Gradient Boosting achieved 81.67% accuracy on Spanish tweets. The obtained results are comparable with the state-of-the-art approach performed in the PAN 2020 fake news detection task. It is observed that Spanish Twitter users have a different posting style than English users. For example, Spanish authors tend to use more emojis in their tweets. The study concludes that retweets, tweet topics, and the count of unique words are vital features to detect fake news spreads on Twitter.
dc.format.extent60
dc.identifier.other109664
dc.identifier.urihttps://drepo.sdl.edu.sa/handle/20.500.14154/66476
dc.language.isoen
dc.publisherSaudi Digital Library
dc.titleSupervised Machine Learning Ensemble for Fake News Classification of Social Media
dc.typeThesis
sdl.degree.departmentArtifcial Intelligence
sdl.degree.grantorUniversity of Southampton
sdl.thesis.levelMaster
sdl.thesis.sourceSACM - United Kingdom

Files

Copyright owned by the Saudi Digital Library (SDL) © 2025